September 1958 


==Vol. 14 Ne. 3: = 
JOURNAL OF THE BIOMETRIC SOCIETY 


The First Decade of the Biometric Society C. I. Bliss 


Reconsideration of Methodology in 
Studies of Pain Relief Paul Meier, Spencer M. Free, Jr., 
and George L. Jackson 


Interaction of Genotype and Environment in 
Continuous Variation: I. Description Kenneth Mather 
and R. Morley Jones 


The Analysis of Variance and Derivation of 
Standard Errors for Incomplete Data G. N. Wilkinson 


The Mathematical Foundations Underlying the 
Use of Line Transects in Animal Ecology J. G. Skellam 


Measurement Errors Associated with Obtaining Acreage 
Estimates of Cotton Fields Jack Fleischer, Daniel G. Horvitz, 
J. Malcolm Airth, and A. L. Finkner 


A Sequential Multiple-Decision Procedure for Selecting 
the Best One of Several Normal Populations with a 
Common Unknown Variance, and Its Use with 
Various Experimental Designs Robert E. Bechhofer 


Queries and Notes 


On Varying One Factor at a Time Cuthbert Daniel 


Errata and Extensions for “The Distribution of the 

European Corn Borer Larvae Pyrausta Nubilalis 

(HBN.) in Field Corn” Judson U. McGuire, Tom A. Brindley, 
and T. A. Bancroft 


: 
i 
i, 


THE BIOMETRIC SOCIETY 


The Biometric Society is an international society devoted to the mathematical 
and statistical aspects of biology. Biologists, mathematicians, statisticians, and 
others interested in its objectives are invited to become members. Through its 
regional organizations the Society sponsors regional and local meetings. National 
secretaries serve the interests of members in Denmark, India, Japan, the Netherlands, 
Sweden, and Switzerland, and there are many members at large. 

Biometrics, the journal of the Biometric Society, is published quarterly. Its 
general objects are to promote and to extend the use of mathematical and statistical 
methods in pure and applied biological sciences, by describing and exemplifying 
developments in these methods and their applications in a form readily assimilable 
by experimental scientists. It is also intended to provide a medium for exchange of 
ideas by experimenters and those concerned primarily with analysis and the develop- 
ment of statistical methodology. 

Original papers, and authoritative expository or review articles or critiques, 
will be accepted for publication in Biometrics if judged consistent with these general 
aims. Predominantly analytical or methodological papers should contribute speci- 
fically to the formulation of quantitative hypotheses, to the interpretation of data, 
or to the planning or analysis of experiments or surveys. Papers dealing with bio- 
logical subjects should report conclusions of definite applicability reached by mathe- 
matical or statistical analysis, so described as to facilitate possible use of the procedure 
in other fields of biology or related sciences. 

Technical notes or problems for consideration under the heading of Queries 
and Notes are invited. 

Information for contributors is given on the inside back cover. 
AnnvaL Duss aNnp MEMBERSHIPS FOR 1958 
Memberships (including dues and subscriptions to this journal) 

Membership (U.S.A. and Canada) $ 7.00 

Student membership (U.S.A. and Canada) 4.00 

Membership for others 4.50 

Student membership for others 2.75 
. Sustaining membership (including two subscriptions to this journal) 100.00 
Subscriptions 

Non-members of the Biometric Society 7.00 
Members of the American Statistical Association 4.00 


All dues, memberships and ‘subscriptions are payable in U.S.A. currency. In- 
formation concerning the Society and memberships can be obtained from its Secre- 
tary, M. J. R. Healy, Statistics Department, Rothamsted Experimental Station, 
Harpenden, Herts., Non-member subscriptions are — to the Manag- 
ing Editor, Biometrics, Department of Statistics, Virginia Polytechnic Institute, 

Vi 


irginia. 
Members of the American Statistical Association who are currently subscribing 
to Biometrics through that organization may become members of the Biometric 
Society on payment of $3.00 annual dues if resident in the United States or Canada, 
and of $1.75 annual dues if resident elsewhere. All correspondence regarding sub- 
to that Association. 


BUSINESS OFFICE OF BIOMETRICS: Department of Statistics, Virginia 
Polytechnic Institute, Blacksburg, Virginia, U.S.A. Changes of address, non- 
member subscriptions, and undeliverable copies should be sent to this office. 

BUSINESS OFFICE OF THE SOCIETY: 509 West Hill Road, Knoxville, 
Tennessee, U.S.A. 

Second-class mailing privileges authorized at Blacksburg, Virginia, with ad- 
ditional entry at Richmond, Virginia. Biometrics is published quarterly—in March, 
June, September, and December. 


j 
7 

! 
| 

4 
ve 


re 


| 
x 
: 
4 


The Biometric Society 


FouNDED BY THE BIOMETRICS SECTION OF THE AMERICAN STATISTICAL ASSOCIATION 


‘TABLE OF CONTENTS 


The First Decade of the Biometric Society . . . . C. I. Bliss 309 


Reconsideration of Methodology in Studies of Pain Relief 
Paul Meier, Spencer M. Free, Jr., and George L. Jackson 330 


Interaction of Genotype and Environment in Continuous 
Variation: I. Description 
Kenneth Mather and R. Morley Jones 343 


The Analysis of Variance and Derivation of Standard Errors 


for Incomplete Data. ......... G. N. Wilkinson 360 
The Mathematical Foundations Underlying the Use of Line 
Transects in Animal Ecology ....... J. G. Skellam 385 


Measurement Errors Associated with Obtaining Acreage Estimates 
of Cotton Fields . . Jack Fleischer, Daniel G. Horvitz, 
J. Malcolm Airth, and A. L. Finkner 401 


A Sequential Multiple-Decision Procedure for Selecting the Best 
One of Several Normal Populations with a Common Unknown 
Variance, and Its Use with Various Experimental 

Robert E. Bechhofer 


Queries and Notes 
On Varying One Factorata Time. ... . Cuthbert Daniel 430 


Errata and Extensions for ‘“The Distribution of the 
European Corn Borer Larvae Pyrausta Nubilalis (HBN.) 
in Field Corn” 
Judson U. McGuire, Tom A. Brindley, and T. A. Bancroft 432 


News and Announcements .... . . 448 


Number 3 September 1958 Volume 14 


Ez 
= = 
cates 
408 
ad 
i 
° q 


BIOMETRICS 
Editor 
Ralph A. Bradley 
Assistant to the Editor: Jane L. Worley 
Editorial Board 


Editorial Associates and Committee Members: C. I. Bliss, Irwin Bross, E. A. 
Cornish, S. Lee Crump, H. A. David, W. J. Dixon, Mary Elveback, J. W. Hopkins, 
O. Kempthorne, Leopold Martin, Horace W. Norton, 8. C. Pearce, G. W. Snedecor, 
and Georges Teissier. Managing Editor: Ralph A. Bradley. 


Former Editors 


Gertrude M. Cox—Founding Editor 
John W. Hopkins—Past Editor 


Officers of the Biometric Society 


General Officers 


President: C. H. Goulden; Secretary: M. J. R. Healy; Treasurer: A. W. Kimball; 

Council: F. J. Anscombe, C. Barigozzi, W. U. Behrens, C. I. Bliss, G. E. P. Box, 
A. Buzzati-Traverso, L. L. Cavalli-Sforza, W. G. Cochran, G. Darmois, W. J. Dixon, 
Sir Ronald A. Fisher, A. Groszmann, A. Bradford Hill, L. Martin, K. Mather, C. R. 
Rao, P. V. Sukhatme, G. Teissier, J. W. Tukey, E. J. Williams. 


Regional Officers 

Region President Secretary Treasurer 
Australasian C. W. Emmens P. J. Claringbold W. R. Sobey 
Belgian and 
Belgian Congo P. P. Denayer L. Martin A. H. L. Rotti 
Brazilian C. C. Fraga, Jr. P. M. Freire A. Groszmann 
British J. O. Irwin C. C. Spicer P. A. Young 
E. N. American B. Harshbarger T. W. Horner T. W. Horner 

(J. Cornfield) 
French A. Vessereau D. Schwartz D. Schwartz 
German O. Heinisch W. Ludwig W. Ludwig 
Italian G. Montalenti R. Scossiroli F. Sella 
W. N. American J. L. Hodges, Jr. M.M.Sandomire M. M. Sandomire 
National Secretaries 

Denmark N. F. Gjeddebek Netherlands E. van der Laan 
India K. Kishen Sweden H. A. O. Wold 
Japan M. Hatamura Switzerland H. L. LeRoy 


THE FIRST DECADE OF THE BIOMETRIC SOCIETY* 


C. I. 


The Connecticut Agricultural Experiment Station and Yale University 
New Haven, Connecticut, U.S.A. 


The first decade of the Biometric Society offers a hopeful contrast 
to the daily news of international tensions, where our very survival may 
depend upon how well and how promptly the world can learn to com- 
municate meaningfully across national boundaries. In this task, 
international non-governmental organizations, among them our Society, 
can play important roles. Without national quotas, the largest single 
national contingent of our regular members, all individual dues-paying 
scientists, constitutes only 39 per cent of the total membership. The 
Society sponsors more than 20 meetings a year in all parts of the world, 
publishes a quarterly journal of growing prestige, organizes periodic 
international conferences, and is financially solvent. How has this 
been accomplished in ten short years? 

1. Antecedents. The first step toward our Society was the formation 
in 1938 of the Biometrics Section of the American Statistical Association. 
Within a few years of its founding, the Section adopted an aggressive 
policy of organizing joint programs with a wide range of biological 
societies, both at annual meetings of the biologists and, less frequently, 
at the annual meeting of the ASA. By the end of 1942, meetings had 
been arranged with the professional organizations in ecology, public 
health, cereal chemistry, pharmacology, biological chemistry, horti- 
culture, and entomology, as well as with the Institute of Mathematical 
Statistics. Biologists interested only in the Section could become 
associate members. 

With the suspension of national meetings during World War II, 
the Section needed other means of maintaining its interdisciplinary 
contacts. The ASA Board of Directors was persuaded to publish on 
its behalf a bi-monthly Biometrics Bulletin, to be sent to all members 
and associate members of the Section. As chairman, it was my respon- 


*A special invited address to the Eastern North American Region of the Biometric Society at 
Gatlinburg, Tennessee, April 10, 1958. 


309 


: 
4 
q 
| 
i 
: 


310 BIOMETRICS, SEPTEMBER 1958 


sibility to find an editor. I have long believed that the telephone call 
from New Haven to Raleigh, in which I persuaded Gertrude Cox to 
accept this post, is one of my major contributions to the science of 
biometry. At the end of its first year, the Biometrics Bulletin was being 
sent to 979 regular and associate members of the Section, of which some 
66 per cent listed a biological discipline as their field of interest. 

Concurrently with this development, the ASA named a committee, 
of which I was a member, to write a new constitution. In reviewing 
the likely development of the Association, our experience with the 
Institute of Mathematical Statistics loomed large. The Annals of 
Mathematical Statistics was initially an “affiliate” of the ASA. To 
insure its support, the Institute of Mathematical Statistics was formed 
in September 1935, and the Annals adopted as its official journal. The 
transfer of ownership to the Institute soon followed, and by 1938 there 
was no mention of the American Statistical Association in the mast 
head or on the title page. It seemed to our constitutional committee 
that other specialized groups within the ASA might also form their own 
societies. If so, the ASA might well develop into a federation of sta- 
tistical organizations, and one of our objectives was to draft a constitu- 
tion which would accept this possibility without weakening the parent 
association. 

Following this line of thought, as Chairman of the Biometrics 
Section, I asked a committee headed by A. E. Brandt to draft a tentative 
constitution for an American Biometrics Society which might supplant 
the Section. This Committee reported in January 1946 at the annual 
meeting of the Section. After discussing the pros and cons of organizing 
as a society rather than as a section, we saw not enough gain to change 
our status at that time, so the proposal was tabled. Meanwhile, the 
Section prospered and, beginning with the phenomenally popular issue 
of March 1947, the Biometrics Bulletin became the quarterly Biometrics 
under the vigorous editorship of Gertrude Cox. 

2. The First International Biometric Conference. Perhaps one 
attributes more importance to chance encounters than they deserve, 
but sometimes I speculate on just how much the formation of our 
Biometric Society depends upon such an incident. The story is this. 
The first session of the International Statistical Institute after the war, in 
1947, met in Washington, D. C., and early in the year they issued a 
preliminary program, a copy of which I had seen. On March 29 I 
attended a meeting of the National Research Council Committee on 
Applied Mathematical Statistics at Princeton University. On the 
train from New York to Princeton I ran into Charles Roos, an economist 
and also a member of the Committee, and we rode together to Princeton. 


| 
4 
4 
an 
\ 
4 
rel: 
/ 


THE FIRST DECADE 311 


The question of the IST program arose and I waxed thoroughly indignant 
over the small place allotted to biometry. 1 insisted to Dr. Roos that 
biology and biometry had been more responsible for the development of 
statistical science than all the other natural and social sciences combined. 
It seemed to me outrageous that the field which led all others should 
have been treated so shabbily by the program committee. 

Roos’ reply was very simple: “If you don’t like it, all you need do 
is to organize an international biometric society.” At first the idea 
seemed to me impossibly difficult, but as we talked the challenge took 
hold. I realized that here was the missing piece in the puzzle. The 
advantages of an American Biometrics Society seemed not enough 
greater than the Biometrics Section to repay the effort of organization. 
An international biometric society, however, offered prospects not 
available to a section of a national statistical association. Moreover, 
we could call upon the experience of the Econometric Society, as Dr. 
Roos was quick to point out. If we were to form such a Society, the 
coming meetings of the ISI provided a unique opportunity. 

Before leaving Princeton that day I discussed the project with 
John Tukey and others. We agreed that a first step would be to arrange 
an International Biometric Conference. Upon returning to New 
Haven, I wrote Dan DeLury, then Chairman of the Biometrics Section 
of ASA, pointing out the scant attention given to biometry at the 
coming international meetings, and suggesting that he name a Section 
committee both to explore the practicability of a biometric conference 
and, if it provetl feasible, to serve as its organizing committee. In his 
reply of April 8th, he named an initial committee of four with myself 
as chairman and power to co-opt such additional members as we needed. 
In the exchange of ideas and letters which followed we decided that 
the proposed conference should meet in a distinctively biological setting, 
preceding and quite separate from the statistical conferences in Wash- 
ington. On May 10 and 11, three of our committee members, Tukey, 
deBeer, and myself, met in New York, pooled our information, and 
decided to go ahead. 

With less than four months to complete all arrangements, the task 
was indeed formidable. We at once enlarged the committee to be 
representative of quantitative biology in both its statistical and mathe- 
matical aspects, with two members named by the National Research 
Council. The Marine Biological Laboratory in Woods Hole on Cape 
Cod agreed to serve as host and the Conference was set for September 5 
and 6. We needed money. On June 12 I had a very encouraging inter- 
view with Warren Weaver of the Rockefeller Foundation and in due 
course they allotted $1000 to the Marine Biological Laboratory for our 


to 
@ 
is 


312 BIOMETRICS, SEPTEMBER 1958 


organizing expenses and the travel of foreign delegates from and to 
New York. 

Meanwhile, the Joint Arrangements Committee for the International 
Statistical Conferences in Washington cooperated actively, inviting 
to their meetings the foreign biometricians whom we named. We, in 
turn, invited to our Conference such of their participants as might be 
interested. The Institute of Mathematical Statistics scheduled their 
meetings for August 30 to September 3 in New Haven, making it an 
easy trip to Woods Hole immediately after their sessions. By July 3rd, 
with the cooperation of The Connecticut Agricultural Experiment 
Station, our publicity had included mailing formal invitations to 209 
scientists in 20 countries and sending out announcements for publication. 

Since an initial objective of the Woods Hole Conference was the 
formation of a viable international society, we studied the constitutions 
of the Econometric Society and of other organizations, and wrote and 
debated a succession of drafts. On September 3 an enlarged committee, 
including several of our foreign delegates, met in my office in New 
Haven to review the latest proposal. This defined in as few words as 
possible a society of individual members without national or other 
quotas, and with provision for regional or national subdivisions that 
would be free to carry on the programs that seemed best adapted 
to their needs. It vested final authority in a broadly representative 
Council of 12 to 20 members, each serving a three-year term with one- 
third elected each year by a mail ballot of all members. The Council 
in turn, was to elect a President, Secretary, and Treasurer. Accordingly, 
when we went to Woods Hole, our “homework” had been done. 

The Conference opened on September 5, 1947, most members 
arriving the day before. The weather was benign and the setting 
propitious. The Conference was welcomed to the Laboratory by 
Charles L. Packard, then Director. In an initial business session 
G. Teissier, R. A. Fisher, and myself were elected Chairman, Co- 
chairman, and Secretary respectively. Following scientific programs 
on quantitative genetics and recent biometric developments overseas, 
Co-chairman Fisher named a committee of 12 to recommend a suitable 
form of international cooperation in biometry. That evening, after a 
clambake, the committee revised the provisional draft constitution, 
which was then mimeographed. 

At a business session the next morning, September 6, the full Con- 
ference debated and approved the draft Constitution, article by article. 
Following a resolution that the drafting committee and such others 
as it might designate should constitute the first Council of the Society 
and that all persons present or invited to the Conference should be 


i 
nA 


THE FIRST DECADE 313 


charter members if they so desired, the Conference, sitting as the Bio- 
metric Society, adopted the constitution unanimously, and our Society 
was born. 

An afternoon scientific session on allometric growth, the adoption 
of suitable resolutions, and a farewell message by Chairman Teissier 
brought the Conference to a close. The Council met immediately 
afterwards, elected three additional members, and for officers R. A. 
Fisher as President, J. W. Hopkins as Treasurer, and myself as Secretary. 
As Editor of Biometrics, Professor Cox offered to publish our Proceedings 
in the next two issues. After considering an alternative proposal by 
the American Naturalist, the Council accepted Miss Cox’s offer and 
her further offer to use Biometrics provisionally as our journal by means 
of a block subscription with the ASA. 

Following Woods Hole, most Council members attended the Inter- 
national Statistical Conferences in Washington, where we met again 
on September 15. At this session, the Council was further enlarged, 
and four Regions approved with provisional officers. Other motions 
set annual dues for 1948 at $4.00, including a subscription to Biometrics, 
and authorized regional Secretary-Treasurers to retain $1.00 or its 
equivalent for regional use. Finally, the Council adopted the descriptive 
clause which appears on the letterhead of the Society: ‘An international 
society devoted to the mathematical and statistical aspects of biology.” 
The Council had done its part; it was now up to the officers, principally 
the Secretary. 

3. The first years. Soon after returning from the Statistical Con- 
ferences in Washington I reported the formation of the Society to 
my then Director Slate with considerable enthusiasm. He brought 
me to earth with a bump. He assured me that I had let myself in 
for a far larger job than I realized. The Society would have to provide 
a competent Executive Assistant to handle the routine if I were to 
meet my obligations at the Experiment Station. Moreover, he knew 
of no room at the Station in which this assistant and the necessary 
office equipment could be housed. Until we had enough paying members, 
our only hope was to find a financial angel. Since the Rockefeller 
Foundation had got us born, he suggested that they were the logical 
ones to keep the infant alive until it could walk. 

Unfinished business from the Woods Hole Conference more than 
filled the next few weeks. It was not until November 13 that the 
first report of the Secretary was mailed to the Council. This has 
since become our primary method for reaching decisions, the first 
Council “memo” containing, for example, a proposal for electing 
Fellows of the Society, which was turned down. In order to approximate 


j 
: 
ig | 
‘ 7 


314 BIOMETRICS, SEPTEMBER 1958 


the give and take of a full Council meeting, any proposal receiving an 
explained negative vote has been resubmitted for a second ballot in 
the next memo, often in a revised form and with a summary of all 
comments returned with the first ballot. This has helped keep the 
Secretary in his place, a difficult task when all decisions have to be 
reached by correspondence! 

The following day I again met with Warren Weaver at the Rocke- 
feller Foundation in New York. After reporting on the Woods Hole 
Conference, I told him of our new financial dilemma. Although their 
books were closed for 1947, he was sufficiently encouraging for a grant 
in 1948 that soon after I asked Mrs. Watkins, the wife of Professor 
John H. Watkins, the first Secretary of ENAR, to act as my Executive 
Assistant. A few days later she started work in our first office, a corner 
of the Watkins’ living room. Meanwhile, in preparation for the Christ- 
mas meetings of the ASA and IMS in New York, a few of us in easy 
reach drafted provisional by-laws for the Eastern North American 
Region. 

These New York meetings marked another milestone in the develop- 
ment of the Society. The ASA Board of Directors formally approved 
our using Biometrics as the Society journal at a block subscription 
rate. On December 27 in Chicago and on December 28 in New York, 
organization meetings of ENAR debated and adopted its by-laws, 
subject to Council approval, thus establishing the first Region of the 
Society. 

We also saw the start of a controversy which still continues. One 
group held that, with the formation of the Society and ENAR, the 
Biometrics Section of the ASA was no more needed by the ASA than a 
section on mathematical statistics to duplicate the work of the IMS. 
This view was promptly challenged by other ASA members, who asked 
why the 108 year-old ASA should surrender one of its most active sections 
to a Society only months old and as yet unproven. Moreover, ENAR 
was limited geographically to the territory east of the Rockies whereas 
the ASA’s jurisdiction extended from coast to coast. The present 
position is that all sessions on biometry at the annual meetings of the 
ASA are sponsored by both the Section and the Region, while the 
Region organizes other joint sessions such as had been handled formerly 
by the Section. 

Quite apart from the meetings themselves, the Society advanced 
on another front. After an invaluable briefing by Sam Wilks, I visited 
the Rockefeller Foundation on December 30 and talked with Mr. 
Chadwell, from whom I learned that a three-year grant in diminishing 
amounts would be in line with their policy, and that amounts less than 


| 
: 
4 
| 
¢ 


THE FIRST DECADE 315 


7500 did not need separate action by their Board of Directors. With 
this background, I asked Gertrude Cox how to ask Foundations for 
money and together we worked out a three-year budget for spending 
$7400. In submitting it to our Council on January 5, I reported a 
membership of 143 but predicted a rise to 2500 by 1951, 1000 more 
than we have reached to date. The final proposal was approved and 
submitted a month later. As of March 1, the Rockefeller Foundation 
granted $7400 to Yale University to be expended on behalf of the 
Society as recommended by its Secretary. Since we were not yet 
recognized as tax exempt, they could not give us the money directly. 

With the grant we purchased desk, file, and other much-needed 
equipment, all duly installed in the Watkins’ living room, and began 
a concerted drive for new members. Additional regions were organized 
in rapid succession, the British Region in May, the Western North 
American Region in July, the Australasian Region in November, a 
French-Italian Region the following February, and an Indian Region 
the next May. Early in this development, a question arose over the 
relation between the Society and its Regions, primarily over their 
jurisdiction in respect to the individual member. This and other 
policies were resolved with the adoption of Council By-laws, dated 
July 12, 1948. 

Operations in the central office suffered a serious set-back in Sep- 
tember by the sudden death of Jack Watkins, the subsequent resignation 
of Mrs. Watkins as Executive Assistant, and the necessity of finding 
new quarters. Early in November we moved temporarily to a room 
in the Department of Public Health at Yale, but by June were office 
hunting again, in competition with University deans who needed space 
for new professors. To their surprise, we found a very adequate room 
in the Laboratory of Applied Physiology at 52 Hillhouse Avenue, where 
the Society had its headquarters for the next six years. When our first 
Directory went to press in July 1949, the organizing drive conducted 
by the Regions and by the Secretary’s office had brought our member- 
ship to 900. 

The publication of our first Directory completed the second stage 
in the history of the Society. Its further development may be traced 
under five headings: regional organization, international affiliations, 
international conferences and symposia, Biometrics, and membership 
composition and growth. 

4. Regional organization. In our regional organization we have 
had to break new ground. Our objectives were two-fold: (1) to give 
each member a greater opportunity for direct participation in Society 
activities, and (2) to minimize exchange difficulties. At the same 


. 
1 
| 
| 
| 
Wig 
line 
We 
fis 


316 BIOMETRICS, SEPTEMBER 1958 


time each Region needed sufficient xutonomy to develop the pattern 
best adapted to its needs. The problem has been solved in varying ways. 

In its first ten years, our largest region, ENAR has sponsored some 
37 meetings, each with one to ten scientific sessions. Nearly all of 
these have been held jointly with other organizations. At the annual 
meetings of the ASA each December or September, the Region has 
co-sponsored an average of seven sessions with the ASA Biometrics 
Section, the IMS, or both. In December 1950, ENAR completed its 
formal affiliation with the ASA and to this date it is the only affiliated 
society in the ASA. Since 1950, the Region has met each spring with 
the IMS for an average of five sessions per meeting, all on the eastern 
seaboard and ranging from Princeton to Gainesville. 

Meetings with biological groups have had fewer sessions. At five 
spring meetings with the Federation of American Societies for Experi- 
mental Biology, four with the Pharmacologists, and one with the 
Immunologists, each single’ joint session has had a relatively large 
attendance. At five Christmas meetings of the AAAS and five Sep- 
tember meetings of the American Institute of Biological Sciences, 
first organized in 1950, the Region has averaged about three sessions 
per meeting, many of them held jointly with their affiliated societies, 
including the Ecologists, Geneticists, Horticulturists, Naturalists, 
Phytopathologists and Plant Physiologists. At three meetings of the 
American Public Health Association, ENAR has co-sponsored one or 
two sessions. A separate clinic session in December 1948 with the 
KEntomologists was recorded and issued later as a multilithed bulletin 
by the Secretary’s office. A two-day session with the New York Academy 
of Sciences in 1949 formed a 153-page issue of the Academy Annals. 
Most impressive of all, a five-week summer Biostatistics Conference 
at Iowa State College in 1952, co-sponsored by ENAR, was published 
in a 600-page volume by the Iowa State College Press. 

Since its formation in 1948, the Western North American Region 
has held eleven meetings, usually of one to three days in June, and 
ranging from Seattle in the north to Pasadena in the south. Besides 
the IMS, co-sponsoring organizations have included the ASA, the 
Ecological Society, the AIBS, and the Mathematical Society, the 
number of sessions per meeting ranging from one to four. 

As no foreign exchange was involved, members in ENAR and WNAR 
paid their dues (until 1955) directly to the general Secretary; elsewhere, 
except for members at large, dues were collected by the Regional 
Treasurers and National Secretaries. In 1949 and 1950 many currencies 
were devalued relative to the dollar, which automatically raised the 
dues of all members in those countries. Moreover, their incomes were 


THE FIRST DECADE 317 


already substantially lower than in the United States and Canada. 
When the Rockefeller grant ended early in 1951, our annual dues of 
$4.50, adopted in 1949, were clearly inadequate and were boosted 
differentially in ENAR and WNAR to $7.00, beginning in 1951. At the 
same time we set up a new class of “sustaining members” at $100 per 
year, for organizations actively interested in the objectives of the 
Society. With one exception, all of our sustaining members have 
been from ENAR and have never numbered more than nine. We could 
do with more members in this class. 

The British Region also dates from the first year of the Society 
and has had 30 meetings in its ten years, usually half-day or one-day 
sessions in London. Only exceptionally have they been co-sponsored 
with other organizations, on one such occasion with two chemical 
societies. A summer meeting in 1952 at Edinburgh lasted two days, 
unusually long for the British Region. They have streamlined their 
collection of dues by the unique British contrivance of “Banker’s 
Orders,” by which a member can instruct his bank to transfer his 
annual dues to the account of the Society on a given date each year 
without further notice, a device which less lucky treasurers must envy. 

Quite a different organizational pattern has developed in France. 
In accord with a law of 1901 governing official French societies, the 
French biometricians formed in 1949 the separate Société Frangaise de 
Biométrie and at the same time the Région Frangaise of the 
Biometric Society, with the proviso that all full members of the Société 
Frangaise must be members of the Biometric Society. The French 
Society-Region has met two or three times each year in Paris, usually 
at the Zoological Laboratory of the Sorbonne, in half-day meetings 
with one to three papers and discussion. Although the regulation has 
since turned up in other countries, the French were the first to require 
a signed and certified annual bill in triplicate from the Society Treasurer, 
listing the name of each member owing dues payable in dollars. With 
this bit of formality, they have been able to obtain the necessary 
foreign exchange. 

The Region for Belgium and the Belgium Congo, organized in 
December 3, 1952 as the Société Adolphe Quetelet, has followed the 
French pattern. Its one to three meetings a year in Brussels have 
featured speakers from other countries more often perhaps than any 
other Region. This highlights another advantage of our regional 
organization, the opportunity it affords for visiting scientists to meet 
associates interested in biometry. 

Our members in the Australasian Region have had to cope with 
long distances, so that their activities have concentrated on the biennial 


| 
\ 
A 
& 


318 BIOMETRICS, SEPTEMBER 1958 


meetings of the Australian-New Zealand Association for the Advance- 
ment of Science. Since its organization in 1949, four meetings with the 
ANZAAS are on record, in Sydney, Canberra, Melbourne, and most 
recently at Dunedin, New Zealand, plus a few meetings of the Region 
alone in Melbourne and elsewhere. In recent years, proportionately 
more members have been enrolled in this Region than in any other of 
similar age; in the same period the President of the Society has been 
appropriately an Australian, E. A. Cornish. 

The Italian Region met first in March 1951 at the University of 
Milan and annually since in Florence, Rome, Pavia, and elsewhere 
in one-day sessions. The Region was host to the Third International 
Biometric Conference in 1953, a major and most successful operation, 
thanks to the skill of the Regional Secretary, L. L. Cavalli-Sforza. 
In 1955 the Italian Region broke new ground under the same leadership 
by organizing an International Seminar on Biometric Methods at 
Varenna on Lake Como. From nearly 100 applicants, 56 students were 
accepted, all but one an Italian resident. A concentrated program on 
modern statistics in biology completely filled the fourteen working 
days of the course. Encouraged by its success, the Region sponsored a 
second course on biometric methodology in Milan on October 8-20, 
1956, again with both lectures and laboratory. 

The first steps towards the formation of a German Region in Frank- 
furt in September 1953, were followed by a successful three-day biometric 
conference at Bad Nauheim in January 1954. A year later at a second, 
similar biometric conference, the Region was formally inaugurated and 
has since met once or twice annually, usually with two or three day 
meetings in January. It is now our fourth largest Region. 

Our youngest region, the Brazilian, was organized in January 
1956 following preliminary steps during the international Biometric 
Symposium at Campinas the preceding July. It has since met annually 
in Séo Paulo and in July 1956 with the Brazilian Association for the 
Advancement of Science. 

Our policy has been to encourage the formation of a Region as 
soon as we had 30 to 35 members within a nation or natural area. 
Only one region has been formed and later abandoned. An Indian 
Region was formed during a meeting of the Indian Science Association at 
Allahabad in January 1949. With 46 members listed in the 1949 
Directory, it met again in Poona a year later. When the rupee was 
devalued, the Indian Region was unwilling to raise its dues to cover 
the devaluation, but in fairness to our other members, the Society 
could not make an exception. After 1950 only a handful continued as 
members at large, but by 1952 we had recruited enough more to name a 
National Secretary for India and this is their present status. 


4 
| 


THE FIRST DECADE 319 


The problem of converting national currencies to dollars for transfer 
to the central office of the Society was only partially solved by the 
formation of Regions. In 1949, the Council approved the appointment 
of a National Secretary in any country with 10 or more members. 
Each year he collects the dues of his compatriots and arranges at least 
one meeting with a biometric program. In return, he retains the equiv- 
alent of one dollar per member toward his expenses, the same amount 
as each Regional Treasurer,* sending the remaining $3.50 to the Society 
Treasurer. 

In December 1949, Secretaries were named for Denmark, Italy, 
and the Netherlands, and later for Belgium, Germany, India, Japan, 
Sweden, and Switzerland. In Italy, Belgium, and Germany the National 
Secretary has been supplanted by a Region, as noted above. Although 
their numbers were sufficient by 1953, our members in Japan have 
preferred to continue with a National Secretary as the “Chapter of 
Japan.”’ After their second meeting and again after their third and 
fourth meetings, they have published 50 to 60 page proceedings from 
typewritten copy, partly in Japanese and partly in English. Our 
members in the Netherlands have met with two other Dutch biometric 
groups, usually in Utrecht, in a total of eight or more meetings. 

Members in Switzerland have not only convened annually since 
1954, but in July 1956 held a five-day biometric seminar for agronomists 
in Zurich with 31 participants and five lecturers. They have also been 
instrumental in arranging a two-week international Biometric Seminar 
and Symposium at Linz, Austria, beginning September 24, 1956. Its 
program was patterned after that held the preceding year in Varenna 
and the participants numbered 150 from 11 countries. 

In the first ten years of our existence, the regional subdivisions of 
the Society have organized 152 or more meetings around the world, 
two-thirds of them in the last five years. These meetings represent 
varying numbers of sessions, ranging from a single session with one 
paper to ten sessions, each with three or more papers. I know of no 
other international scientific organization that can match this record. 

5. International affiliation. In seeking the appropriate international 
scientific body with which to affiliate, we turned first to the International 
Council of Scientific Unions (ICSU). Our organizational pattern, 
however, as an individual membership society, differed too sharply 
from that of its constituent unions for direct affiliation. In 1948 the 
25-year old International Union of Biological Sciences (IUBS), a 
member of ICSU, invited the Society to provide the Secretariat of a 
new section on biometry, to which we agreed, and in 1952, following a 


*Except in ENAR, which has Leen able to operate since 1950 with only 25 vents per member for 
all over the first 100. 


| 
‘ 
ab 
a 
te 


320 BIOMETRICS, SEPTEMBER 1958 


change in policy, the Society itself became the Biometric Section of 
the TUBS. 

The Society has been represented and vocal in the last three General 
Assemblies of the IUBS. At its 10th Assembly in Stockholm in July 
1950, President Linder and B. Matérn summarized our activities 
during the preceding three years and reviewed plans for the future, 
including a project on the teaching of biometry. At the 11th Assembly 
in Nice in August 1953, attended by R. A. Fisher, M. Lamotte, A. 
Linder, and myself, we again reported on the activities and plans of the 
Society. Early in 1955, a preliminary report on courses in biometry 
was distributed to key members of the Society and revised with the 
aid of their comments. The final statement on “Biometric Needs and 
Opportunities in Biological Mducation’” was presented during the 
12th IUBS Assembly in Rome in April 1955, attended by L. L. Cavalli- 
Sforza, A. Vessereau, and A. Linder, where our organizational pattern 
was strongly commended by President Horstadius. 

At all three Assemblies the Biometric Society was voted financial 
support, in sums ranging from $100 for minor projects to $2500 toward 
an International Conference or Symposium, largely from UNESCO 
grants. During the 12th Assembly, the sections of the IUBS were 
regrouped under Plant Biology, Animal Biology, and General Biology. 
In this last division the Society forms the Section of Biometry, the 
others being Cell Biology, Genetics, Microbiology, and Limnology. 

As already noted, our association with the International Statistical 
Institute (ISI), founded in 1885, began in 1947 during the First Inter- 
national Biometric Conference. This led to our formal affiliation with 
the ISI in December 1948, each organization being represented in the 
other by its Secretary. The cordial cooperation of the ISI and the 
interest of many of our members in its activities have been largely 
responsible for the timing of most of our international meetings immedi- 
ately before or after the biennial Sessions of the ISI, usually in a nearby 
city. In addition, the Society has sponsored individual programs at 
ISI Sessions, there being three of these at the Stockholm meetings in 
August 1957, on problems of experimentation, statistical genetics, and 
statistics in medical research respectively. The very considerable 
travel funds available for delegates to the ISI have made it possible for 
our Biometric Conferences to obtain speakers from a wider geographical 
area than would otherwise be possible. 

The World Health Organization is the third international body 
with which the Society js associated. Since our affiliation in 1949, 
representatives of the Society have attended a number of their inter- 
national and regional assemblies. ‘This is a collaboration which might 


1 
i 


THE FIRST DECADE 321 


well be enlarged in view of the wide-ranging activities of the WHO in 
areas of mutual concern. 

6. International conferences and symposia. In discussing our first 
conference in Woods Hole, I described some of its problems, most of 
which have reappeared in the four international meetings which the 
Society has since sponsored. Of these, one of the most important is 
finding an able secretary (or chairman) for the local arrangements 
committee, whose task includes raising a sizable kitty from local and 
national sources. In speed of publication, our biometric conferences 
rate better than most international congresses, the general proceedings 
and abstracts appearing in the next few issues of Biometrics, together 
with a sizable proportion of the original papers. Reprints of these 
papers and reports have then been assembled and bound separately, 
often with the aid of subsidies from the IUBS and other sources. Since 
these are readily available, I need only summarize them in order. 

The Second International Biometric Conference met on August 30 
to September 2, 1949 at the University in Geneva, Switzerland, with 
President Fisher presiding. At the opening session, Professor A. 
Franceschetti addressed the Conference successively in French, English, 
and Italian and, after an afternoon program on experimental design, 
entertained us for tea at his estate on Lake Geneva. Morning and 
afternoon sessions in the next three days concerned recent applications 
of biometric methods in genetics, biometric aspects of biological assay, 
the present status of biometry, industrial applications of biometry, 
and teaching and education in biometry, concluding with a session of 
contributed papers. Evening receptions by the Canton and Town 
of Geneva, opening and closing business meetings, and two Council 
meetings completed the program. The 103 delegates came from 19 
different countries, with the largest national delegation of 18 from 
Great Britain. On September 3, many of us traveled from Geneva to 
Berne for the ISI meetings on a special all-day tour through some mag- 
nificent scenery. Much of the success of the Conference was due to 
the splendid work of the local secretary, A. Linder. 

The next international session of the Society was a symposium 
on December 17-18, 1951, on “Biometric problems in the prediction 
and estimation of the growth of plants in tropical and sub-tropical 
regions.” The Indian Statistical Institute in Calcutta was host, Presi- 
dent Linder arranged the program and presided, and about 150 persons 
attended. The three papers presented on the first evening were followed 
by two more the following morning, in each case with extensive dis- 
cussion. The symposium was sponsored by the Society in its dual 


| 
a ne 
2 
ee: 
| 
\ 
1 
3 
‘ Ae 


322 BIOMETRICS, SEPTEMBER 1958 


role as a Society and as a Section of the IUBS. The 116-page Proceed- 
ings were published later in Calcutta for the IUBS. 

The Third International Biometric Conference met on September 
1-5, 1953, at the Hotel Grande Bretagne in Bellagio, Italy, on Lake 
Como, immediately following the Ninth International Genetic Congress, 
also in Bellagio, and just preceding the 28th Session of the ISI in Rome. 
After speeches of welcome by our Italian hosts, Professor Darmois 
opened the scientific program with his presidential address. Morning 
and afternoon sessions on each day were concerned with the first course 
in biometry, mathematical problems in genetics, methodological 
problems in biometry, biometry in immunology, biometric methods in 
agriculture, functional relations in experimentation, contributed papers, 
and industrial applications of biometry. A series of exhibits on the 
second evening, a meeting of Council and two short business sessions 
completed the work of the Conference. For diversion, we were enter- 
tained by a motorboat excursion on beautiful Lake Como, an evening 
party at the Lido, and a banquet. Of 125 participants in the Conference, 
representing 24 different countries, 101 were members of the Biometric 
Society. The primary burden of this session was carried superbly by 
L. L. Cavalli-Sforza of Milan. 

Our next international meeting, a symposium on “The role of 
biometric techniques in biological research” in Campinas, Brazil, 
on July 4-9, 1955, followed the 29th Session of the ISI near Rio de 
Janeiro. The program committee was chaired by President Cochran 
and the local committee by C. G. Fraga. The University of Sao Paulo 
served as our host. Registration totalled 98 from 17 different countries, 
with 62 from Brazil. Professor Cochran opened the scientific program 
with his Presidential Address. The following sessions were devoted to 
biometrical genetics, experimental designs for perennial crops, other 
experimental designs, statistics applied to animal-feeding experiments, 
sampling techniques, bioassay, and medical statistics, plus two panel 
discussions. On the lighter side, we enjoyed an excursion to Piracicaba, 
two social evenings, and a visit to a dairy and coffee farm. One result 
of the symposium was the formation of the Brazilian Region the follow- 
ing January, our first in Latin America. 

In any evaluation of international meetings, I would stress again 
their invaluable role in promoting international understanding through 
personal discussions of common problems and the good will generated 
by the unstinting hospitality of our hosts. 

7. Biometrics. A major factor in the success of a scientific society 
is its journal, for many members the only tangible return from their 
annual assessment. When our Society was formed, we were fortunate 


‘4 
a8 


THE FIRST DECADE 323 


in obtaining space in Biometrics for Society news and in arranging a 
block subscription for our members. Conversely, Biometrics, then in 
its third year and with the struggles of the first two years still vividly 
in mind, stood to gain more subscribers and a new source of articles. 
Even in our first year, however, it was evident that this was only a 
temporary solution and that the Society would need its own journal. 
Since all members of the editorial board of Biometrics were charter 
members of the Society, our first choice was a transfer of Biometrics 
from the ASA to the Society, following the precedent established in the 
preceding decade by the transfer of ASA rights in the Annals of Mathe- 
matical Statistics to the IMS. 

In accord with a resolution of ENAR at its business meeting in 
December 1948, the Council of the Society voted in February 1949 
to negotiate with the ASA for Biometrics. To implement this decision, 
President Fisher named a special committee consisting of Gertrude Cox, 
John Hopkins, and myself. Tentative terms of the transfer were 
submitted to Council in May 1949, approved by a vote 21 to 1, and 
reconfirmed at a Council meeting during the Geneva Conference. The 
transfer was also approved by a large majority of members of the 
ASA Biometrics Section in a mail ballot during the summer of 1949, 
and confirmed unanimously at its annual meeting the following Decem- 
ber. The ASA Board of Directors and Council then approved the 
transfer and named a negotiating committee to arrange its terms. 
These were completed in legal form, signed in August 1950, and pub- 
lished in Biometrics for December. Starting with Volume 6, Biometrics 
became the property and official journal of the Society. The Society 
was doubly lucky in Miss Cox’s consenting to continue as editor, 
with the management of the journal remaining at the Institute of 
Statistics in Raleigh. 

The Editorial Board was at once enlarged on an international 
basis and a statement of policy adopted after discussion in Council 
memos. The refereeing of all papers before acceptance was continued, 
except for papers delivered at an International Conference and approved 
by its organizers, when the cost of publication was subsidized separately. 
Financial accounts for Biometrics and for our other activities have been 
kept separately from the start. When our American and Canadian 
members were charged $7.00 per annum, the price of Biometrics to 
non-member subscribers, other than the ASA block subscription, was 
also increased to $7.00. By 1957, its circulation had increased to over 
2800, including 890 non-member subscribers. This growing circulation, 
together with the sale of back issues, has made Biometrics self-supporting 
and built up a reserve, even though 60 per cent of Society members 


| | 4 
| 
| 
| 
7 
: 


it 


324 BIOMETRICS, SEPTEMBER 1958 


pay less than the cost of printing and mailing their copies. Any uncer- 
tainties as to our ability to publish Biometrics, which led to a stipulation 
that it revert to ASA if discontinued within five years, have been laid 
decisively at rest. 

Biometrics has also had its problems. One is the periodic complaint 
by our biological members that the journal is becoming too “high brow” 
statistically for them to understand, and the counter-complaint of the 
Editor that good biological, less technical manuscripts are hard to come 
by, despite numerous pleas for material. After all, editors cannot 
accept papers that are not submitted for publication. When Professor 
Cox resigned as Editor, she was succeeded by J. W. Hopkins of the 
National Research Council of Canada, the transfer being completed 
during 1955. Early in 1957, Dr. Hopkins suffered a protracted illness 
which necessitated a second transfer of the editorial offices and back 
issues, with attendant delays. This has given us our present Editor, 
Dr. Ralph Bradley, a Canadian by birth and early education, with 
editorial offices in the Department of Statistics of the Virginia Poly- 
technic Institute at Blacksburg. 

Although Biometrics is an international journal, most of its papers 
are in English, and over the years it has become more technical and 
advanced statistically. In 1953 Dr. Leopold Martin, then Secretary 
of the Belgium Region, proposed starting a second journal to be called 
Acta Biometrica, with more emphasis on quantitative biology and less 
theoretical statistics. It was to appear primarily in languages other 
than English and draw upon papers given at Kegional meetings in 
continental Europe. His proposal was discussed at length by members 
of Council and others. Although in sympathy with the objectives of 
the proposed journal, we were unable to solve essential details, such 
as its editorial policy, relation to Biometrics, and financing, so that the 
proposal was never implemented. 

8. Membership composition and growth. The membership record 
has been a primary responsibility of the Secretary. A member in 
arrears for dues is removed from the mailing list for the June Biometrics 
and after further notices, he is assumed to have resigned if still unpaid 
by the following year. Since the dues of American and Canadian 
members were collected initially by the Secretary, the two offices were 
combined in 1951. By 1955 our Regional Treasurers and National 
Secretaries were collecting all dues, except those from members at 
large, so that with the election of M. J. R. Healy of England as Secretary 
and the decision to keep the general Society funds in dollars, the two 
offices were again separated. In 1957, when our present Treasurer, 
Allyn Kimball, took over, the Society’s accounts were moved to Ten- 
nessee, 


| 
| 


THE FIRST DECADE 325 


From the start, the Secretary has been responsible for publishing 
a Directory which would include information on member activities and 
interests. This has proved a much larger project than we first antici- 
pated, so that directories were issued only in 1949 and 1953 from the 
New Haven office and in 1957 from Harpenden. The member interests 
of the 900 listings in 1949 were analyzed by John Tukey in Biometrics 
for June 1950; a more extended study of the 1144 members in 1953 by 
Colin White appeared in Biometrics for December 1954. A comparable 
study of our 1957 Directory would be most welcome. The following 
summaries are pertinent. 

The geographic as distinct from Regional distributions in Table 1 


TABLE 1 
GEOGRAPHICAL DIsTRIBUTION 
Area No. of Members Percentage 
1949 1953 1957* | 1949 1953 1957 
USA 547 557 547 60.8 48.6 38.8 
Canada 23 35 34 2.5 3.0 2.4 
Great Britain 107 139 174 11.9 12.1 12.3 
Australasia 38 43 69 4.2 3.7 4.9 
France 45 50 63 5.0 4.4 4.5 
Italy 24 53 73 yy i 4.6 5.2 
Germany 1 22 97 0.1 LP 6.9 
Belgium 0 63 48 0.0 5.5 3.4 
Other Europe 32 64 104 3.6 5.6 7.4 
Brazil 7 9 70 0.8 0.8 5.0 
Other Latin America 19 25 21 2.1 2.2 1.5 
Japan 0 38 45 0.0 3.3 3.2 
Other Asia 50 22 19 5.6 1.9 1.3 
Rest of World 7 27 46 0.8 2.4 3.3 
Total 900 1147 1410 


*Members in good standing at end of 1957. 


show a steady overall gain in membership, especially in areas outside 
the United States. These are based upon the first two directories and 
a count of members in good standing at the end of 1957. Our deficiencies 
are most marked in Latin America (other than Brazil), in Asia, and 
especially in countries of the Communist bloc. The 1957 Directory 
lists five or more from East Germany, one from Czechoslovakia, two 
from Rumania, and one from the USSR. ‘l'o promote a better under- 
standing between Hast and West, members from this area would be 
doubly welcome. 


} 
| 
Lek 
| 
Tie 
i 


326 BIOMETRICS, SEPTEMBER. 1958 


Member interests present a problem in classification where no 
two people are likely to agree. In Table 2, I have attempted to sum- 
marize the analyses of our 1949 and 1953 directories in approximately 
equivalent categories. Many members have overlapping interests in 
two or more fields and these have been prorated fractionally. The 
biggest change from 1949 to 1953 is the increase in statistics, a trend 


TABLE 2 
MEMBER INTERESTS 
1949 1953 

Interest No. % No. % 
a. Mathematics 86 10.3 108 9.5 
Statistics 144 72 371 32.5 
b. General biology 183 21.8 192 16.8 
Applied biology 92 11.0 148 13.0 
c. Medical sciences 142 16.9 158 13.8 
d. Human biology 34 4.1 39 3.4 
Public health 84 10.0 72 6.3 
e. Other 73 B.7 54 4.7 

Not identified 62 — 2 _- 
Total 900 100.0 1144 100.0 


which is reflected in our ENAR programs and probably also in the 
contents of Biometrics. I wonder whether we are reaching the biologists 
as effectively as we should, if we are to live up to our professed objective 
as a “society devoted to the mathematical and statistical aspects of 
biology.” 

Although we have almost no restrictions on membership, the educa- 
tional and employment status of our members certainly qualifies us 
as a “learned society,” judging from the 1953 Directory. As summarized 
in Table 3, some 55 per cent of our members have a doctor’s degree and 
there are probably very few if any without a bachelor’s degree or its 
equivalent. Educational and governmental institutions employ three- 
fourths of our members and a large proportion of the others are engaged 
in research or development, whether in non-profit institutions, such as 
hospitals and research institutes, or in the research laboratories of 
private industry. In these two aspects we can be reasonably content. 

In another area, however, we have little ground for smugness. 
This is in our ability to hold members. I have compared the alpha- 
betical listing in the Directories for 1949 and 1953 and similarly for 


| 
4 
4 


THE FIRST DECADE 327 
TABLE 3 
EDUCATIONAL AND EMPLOYMENT STaTus—1953 
Academic degree % | Type of employer % 
Medical (M.D., M.B., D.P.H.) 9.6 | Educational 43.0 
Other Doctorates 45.7 | Governmental 31.8 
Masters 15.4 | Private industry 14.9 
Bachelors 13.6 | Non-profit 5.4 
Other and not given 15.6 | Other 4.9 
Total 99.9 | Total 100.0 


1953 and 1957, with the results shown in Table 4. Membership gains 
have been impressive but our losses disquieting. Most of our members 
belong to a number of scientific organizations, in some cases ten or 
twelve, and if they see no tangible benefit from their membership, 
they are not likely to continue. Of the 900 members listed in 1949, 
37 per cent were missing in 1953 and of the 1144 members listed in 
1953, 29 per cent were missing in 1957. Fortunately, the rate of loss 


TABLE 4 


MEMBERSHIP GAINS AND LossEs 
Number of members 


Lost 
Area 1949 1949+ Newin} from 1953+ Newin 
only 1953 1953 1953 1957 1957 

USA and Canada 225 345 247 151 441 148 
Other 112 218 334 183 369 398 

Total 337 563 581 334 810 546 

% loss from Net % gain 
Area 
1949 1953 1953 1957 

USA and Canada 39.5 25.5 3.9 
Other 33.9 33.2 67.3 


Total 37.4 29.2 27.1 


i 
| 
+ 
~ 
a 
18.5 


328 BIOMETRICS, SEPTEMBER 1958 


is diminishing but it still seems too high to be viewed with complacency. 

If we subtract the losses from the gains in successive directories, 
we obtain the net percentage gains in the lower part of Table 4. In 
1953 the net gain in the two North American Regions was not quite 4 
per cent, and in 1957 they had a loss of 0.5 per cent. In contrast, the 
corresponding net gains in the rest of the world were 67 and. 39 per 
cent. These figures contrast with the growth of our non-member 
subscribers to Biometrics, who numbered about 400 in mid-1951 and 
890 in 1957. There is evidently no lack of interest in biometry and its 
applications, as judged from a more than doubling of the number of 
subscribers in six years, almost entirely through the efforts of the 
Kditor’s office. The increase of 900 to 1400 members in the longer 
period from 1949 to 1957 suggests that we have by no means exhausted 
our opportunities for growth. 

The second decade. How can we make our second decade even 
more fruitful than our first? I have already mentioned several oppor- 
tunities for development, especially in enlarging and holding our 
membership. This is primarily a project at the regional and national 
level. Our contacts are now more fully developed with organizations 
in statistics than in the various biological disciplines. Additional joint 
sessions, primarily at biological meetings, would contribute to closer 
and more varied contacts. Because of its interdisciplinary character, 
the Biometric Society can support a program of international conferences, 
congresses, and symposia at closer intervals than the usual three to five 
year gap between the international congresses in subject matter fields. 

Paralleling our growing activity at scientific meetings, we will 
need an expanded publication policy, initially, at least, by enlarging 
Biometrics. A welcome addition would be a section of book reviews for 
books on statistical methodology and quantitative biology. Because 
articles of biometric interest are published in a wide range of media, 
including experiment station bulletins, journals in agriculture, medicine, 
other fields of biology, psychology, and the physical sciences, as well as 
in statistical and mathematical journals, tracking down the literature 
is even more complex in biometry than in other sciences. This task 
would be aided materially by including in each issue one or more review 
articles on recent advances in a specialized field, both in methodology 
and in the various areas of application. These could be patterned, in 
part, after those in the Annual Reviews of Pharmacology, Entomology, 
etc. or in Physiological Reviews. A year or more in advance the editor 
for the series would invite a specialist in each field to review the recent 
advances in his area and provide an adequate bibliography. 

Proceedings of our international meetings, which now appear in 


a 
a 
2 


THE FIRST DECADE 329 


several issues of Biometrics, might be published in a special supplement 
and made available for separate distribution. Even though English 
has become the most. nearly universal scientific language, our Society 
would have « greater impact if adequate summaries of the articles in 
Biometrics could be issued in other languages, preferably as joint 
projects of individual Regions and Biometrics. 

Where instruction in modern biometry is not now available, Society- 
sponsored seminars, such as those held at Varenna and at Linz, could 
be effective in encouraging universities to institute suitable programs. 
These should include special summer courses for students at various 
levels, such as are now current in parts of the United States. The 
Society could help maintain their quality by preparing general recom- 
mendations as to content, prerequisites, and suitable texts, that would 
be available on request. 

As our membership and influence grow, we may anticipate assign- 
ments which we would be better able to fill than any other organization. 
We may be asked to recommend referees for articles appearing in other 
journals but which require competent biometric refereeing. Biometri- 
cians seem to be more mobile internationally than many other scientists, 
calling for an employment exchange within the Society on an inter- 
national as well as a regional basis. We may find ourselves advising 
international and national bodies on the feasibility of specific projects 
and then participating in their execution. 

None of these suggestions is revolutionary. Their very obviousness 
should make them logical developments of our first decade. They still 
leave plenty of scope for proposals by younger, more imaginative minds. 


| 
| 
oe 
Big 
Shee 
f 
2 


RECONSIDERATION OF METHODOLOGY IN STUDIES 
OF PAIN RELIEF 


MEIER 
Department of Statistics, University of Chicago, Chicago, Illinois, U.S.A. 
SPENCER M. FRee, JR. 
Smith, Kline, & French Laboratories, Philadelphia, Pennsylvania, U.S.A. 


Gerorce L. Jackson 
1919 North Front Street, Harrisburg, Pennsylvania, U.S.A. 


INTRODUCTION AND SUMMARY 


In the comparison of drugs used for the relief of pain, the standard 
practice is to arrange the experiment so that “each patient acts as his 
own control” (Beecher [1957], Keats [1957]). In the language of agri- 
cultural field trials, the patient is the block, and the number of plots per 
block is the number of trials which can be made on a single patient. 
The thinking behind this tradition is straightforward: patients ordinarily 
differ in the amount of pain which they actually feel and in their toler- 
ance for pain, and it may be expected that estimates of differences 
between drugs will be more precise if they are tested on the same patient 
than if they are tested on different patients. This expectation is cer- 
tainly plausible, but we have been unable to find references which give 
data in sufficient detail to permit investigation of this point. 

The study reported here was arranged in accordance with these 
ideas and was therefore designed to be analyzed on a within patient 
comparison basis. However, examination of the results indicated that, 
at least for the type of pain studied here, the variation between patients 
at a fixed time after operation may actually be less than the variation 
shown by a single patient at different times. Thus, the use of each 
patient as his own control may result in a loss rather than a gain in 
precision. 

Furthermore, the ordinary analysis as a randomized block experi- 
ment seems not to be valid in this case, whether there is a gain in 
precision or not. The original intent was to analyze this experiment as 
a randomized incomplete block design and to take advantage of the 
recovery of irter-block information, if that should turn out to be 


330 


| 


i 
4 
4 


STUDIES OF PAIN RELIEF 331 
worthwhile. When the data were examined it was found that duration 
of relief increased markedly with time after operation. Under these 
circumstances the assumptions required for the validity of the least 
squares analyses are violated, so a valid alternative (and somewhat 
simpler) analysis was used instead. However, it may be of interest to 
see how far the failure of the assumptions distorts the results and the 
least squares analysis is presented also for comparison. 

Finally, in view of the time trend, the question arose whether a simple 
comparison between drugs based on the first treatment period only 
might not discriminate between drugs as well as the more complex 
analyses. Despite the fact that this analysis discards half the data, it 
was found to discriminate somewhat better than did the standard 
within patient comparisons. 


METHODS 


All patients studied were either private or ward patients on the 
surgical service at the Harrisburg Polyclinic Hospital. They ranged 
in age from 21 to 65 years. 

Sixteen patients were eliminated from the study because they 
required too few injections for pain relief. Only those patients who 
required four injections in 24 hours were included. 

Nurse observers were used for questioning of patients. The nurse 
observers obtained permission from the operating surgeon to include 
each patient in the study program preceding surgery, then confirmed 
the decision after surgery. The patient was interrogated concerning 
previous experience with analgesic drugs and abnormal responses to 
medication. The most frequent operative procedures were abdominal 
and orthopedic, although a small number of thoracic surgery patients 
are also included. 

The patients were observed hourly for a minimum period of 16 
hours after their return from recovery room. These hourly observations 
by the nurses included the vital signs and the amount, character, and 
location of pain. The nurse was asked to classify the pain according to 
four grades: none, mild, moderate, and severe. Following injections, 
pain relief during each subsequent hour was recorded as greater than 
50 per cent or less than 50 per cent. 

Whenever the patient complained of severe wound pain (constant 
aching at the site of incision), in contradistinction to intermittent pain, 
or pain from organ distention, the nurse was instructed to give the 
patient a test drug. Each drug was administered whenever the patient 
complained of severe pain, except that it was never given more fre- 
quently than once an hour. To be considered as providing pain relief 


a8 
Rie 
ix 
NE 
Ve 
raft 
| 
i 
| 
ik 


B 
a No NN! MM CH 2 
| | Ax 
2 
| + No wt NR NO 
olor 
= 3 
= z=) a ao|oo 
a a || No 0 N te 2 
ro ON fs on tom on| NO 
= ! Ay 
| 4 
= ann O27 St 
Nw | tO] om no awloo 
2 
| oo 
! 
nN 
| a 
. 


} 
a 
3 


STUDIES OF PAIN RELIEF 333 


during a given hour post injection, a drug had to score more than 50 
per cent pain relief for that hour. 

To make the study double-blind, each patient was assigned four 
ampules coded to indicate the patient number and the order of admini- 
stration. The first two doses given were identical and differed from the 
last two doses which were also identical. Two doses of each drug were 
administered in order to minimize the effects of interaction between 
drugs (Jackson [1956]). All drugs were administered intra-muscularly. 
The drugs studied were 75 mg. Demerol (D)) and an exploratory drug 
at levels of 1 mg. (7',) and 3 mg. (T;). Thus the study is an incomplete 
block design with two drugs per block (patient). 

The number of hours reported as more than 50 per cent relief after 
each injection was recorded as the hours of relief for that ampule 
(Table 1a). The actual analyses were all run on the sum of the hours 
of relief reported for a pair of identical ampules given to the same patient 
(Table 1b). 


RESULTS 


The durations of relief reported by the patients are shown in Tables 
la and lb arranged according to the order of drug presentation. A 
number of characteristics are immediately apparent. 

First, it is clear that, on the whole, the duration of relief is longer 
during administration of the second drug, no matter which dose or 
order of presentation is used. It was hoped initially that over the first 
day post-operation the level of pain might be nearly constant, and that 
the first and second periods could be considered equivalent. Apparently 
the pain decreases or tolerance for pain increases fairly rapidly, and 
this initial assumption is clearly invalid. 

Second, the durations of relief during the first period suggest that 
D is the most effective drug and 7’, the least effective. Thus the average 
relief in the first period from D is (42 + 47)/(7 + 7) = 6.26 hours; the 
average relief from 7; is 4.31 hours; and the average relief from 7, 
is 2.23 hours. Now it is gratifying to notice that the drugs are in fact 
different, but on account of the time trend this fact adds a further 
complication to the analysis. Namely, a drug given during the second 
period to a patient who received D during the first period will show a 
greater duration of relief, on the average, than it will if given to a 
patient who had 7’, during the first period. The reason for this is that 
if D, the long-acting drug, is given first, the second drug has the advan- 
tage of being administered on the average 6.36 hours after the start of 
the experiment. If 7, is given first, the second drug is given on the 
average about 2.23 hours after the start of the experiment. Thus we 


| 
| 
3 
i 
} 
j 
| 
| 
| 
|) 
pops. 
5 
ve 


334 BIOMETRICS, SEPTEMBER 1958 


see, for example, that when given after D, 7 averages 8.57 hours of 
relief; but when given after 7, , T; averages only 7.13 hours of relief. 

As a result of this time trend effect, the within patient comparisons 
are biased in the direction of minimizing the differences between drugs. 
If the drug given first is a good one, the second drug will be given later 
than the average and will tend to give greater than average duration 
of relief. If the first drug is a poor one, the second drug will be given 
earlier than the average and will therefore give less than average duration 
of relief. 

We will exhibit three analyses for this body of data on the basis of 
which we will discuss the relative merits of simple one-way designs and 
designs which incorporate the own control feature. 


TABLE 2 
ANALYsIS OF VARIANCE OF First DosEs 
Source Di. S.s. M.s. 
Drugs 2 114.808 57.404 
Error within Drugs 40 336.959 8.424 
within 7; 12 56.308 4.692 

T: 15 155.437 10.362 

D 13 125.214 9.632 
Total 42 451.767 


Druc MEANs (IN HOURS) 


T; Ts Demerol 


2.231 4.313 6.357 


Drua ComMPARISONS 


Comparison Average Difference Variance t 


D vs. T; 4.126 1.247 3.69 
D vs. T; 2.044 1.129 1.98 
T; vs. T; 2.082 1.171 1.92 


| 
‘c 
! 


STUDIES OF PAIN RELIEF 335 


1, SIMPLE ONE-WAY ANALYSIS OF FIRST PERIOD ONLY 


By far the simplest analysis is given by considering only the measure- 
ments of duration of relief in response to the first drug administered. 
The analysis is that for a single factor experiment with unequal sub-class 
numbers (Cochran and Cox [1957]) and is shown in Table 2. Since we 
take only one observation on each individual and the time from operation 
to administration of the drug is the same in all cases, aside from random 
fluctuations, there is no difficulty at all in the interpretation of the 
results. We can take no advantage of within comparisons, but all 
measurements are taken at peak pain intensities where we may expect 
to get maximum discrimination between good and poor agents. 


TABLE 3a 
WirHin Patient CompaRISON OF Drucs 


Analysis of Patient Differences 


Dd. S.s. M.s. 
Pooled Error Term 37 428.945 11.593 
From D — 7, 6 126.428 21.071 
7, 4 8.400 2.100 
D-T; 6 109.429 18, 238 
7 33.938 4.848 
T; — 7; 7 43.750 6.250 


Calculation of Average Mean Difference: Drug D vs. Drug 7; 


(0.400 1.600) + (6.000 — 6.148) | = 3.828. 


Calculation of Variance of Difference 
2 
1) = .174s” = .174 (11.598) = 2.017. 


ComMpARISONS 


Comparison Mean Difference Variance t 
Dvs. T; 3.828 2.017 2.70 
D vs. T3 1.759 1.553 1.41 
T; vs. T; 1.125 1.449 


; 4 

a 

| 

4 

| 

| 

| 

Pe 


336 BIOMETRICS, SEPTEMBER. 1958 


2. LINEAR COMBINATIONS OF INTRA-INDIVIDUAL CONTRASTS 


In an experimental design such as this one, the effect of the time 
trend is to diminish within individual differences in apparent drug 
effects. In addition, the restriction to intra-individual contrasts reduces 
by about half the number of subjects available for estimation of a 
treatment difference. Thus a substantial gain in precision resulting 
from the within individual comparisons is required to make this design 
advantageous. Furthermore, since the time of administration of the 
second drug is of necessity quite variable, one must decide precisely 
which measure of average duration of relief should be used to compare 
two drugs. An obvious comparison of drugs A and B is obtained by 


TABLE 3b 
BETWEEN PATIENT CoMPARISON OF DruGs 
Analysis of Patient Totals 
D.f. S.s. M.s. 
Pooled Error Term 37 534.580 14.448 
From D — 7; 6 52.429 8.738 
7, -D 4 40.000 10.000 
D-T; 6 121.714 20.286 
T,; —D 7 130.937 18.705 
T: — 7; 7 123.750 17.679 
7 65.750 9.393 


Calculation of Average Mean Difference: Drug D vs. Drug 7; 
Calculation of Variance of Difference 

2s" (2 1,1 1) 2 

4 \8 = 8 8 .259s° = .259 (14.448) = 3.742. 


CompARISONS 


Comparison Mean Difference Variance t 
D vs. T; 3.955 3.742 2.05 
D vs. 7; 1.071 4.277 .52 


T; vs. T; 2.884 4.407 1.37 


| 
§ 
| 
| 
d 


STUDIES OF PAIN RELIEF 337 


averaging the mean difference between the drugs when A precedes B 
with mean difference when B precedes A. As the number of individuals 
with A preceding B differs from the number with B preceding A, one 
must take care to give equal weight to each mean despite the differences 
in precision. A weighted mean would be biased because of the time 
trend. The average contrast is a linear combination of the observations, 
and its variance can thus be calculated directly. A sample calculation 
and the results are shown in Table 3a. 

In Table 3a the estimates are based solely on the within individual 
comparisons.: However, the difference in total duration of relief between 
an individual given 7, and D and an individual given 7; and D gives 
information on the difference between 7’; and 7; . Thus we can make 
between patient comparisons using linear combinations of patient 
totals. These are exhibited in Table 3b. Again variances are estimated 
by using a pooled error term. 

Finally, if we assume that patient totals and patient differences are 
uncorrelated, we can use an average of the above two estimates weighted 
inversely as their variances. A sample calculation and the results are 
shown in Table 3c. 


TABLE 3c 
CoMBINED CoMPARISON OF DruG DIFFERENCES 


1 
= S017 ~ 4958 "378 


Calculation of Average Mean Difference: Drug D vs. Drug 7; 
(w)3.828 + (w’)3.955 


w = .2672 


= 3.873. 
(w + w’) 
Calculation of Variance of Difference 
1 1 
= => 1 1 
(w + w’) 4958 + .2672 
ComMPARISONS 
Comparison Mean Difference Variance! t 
D vs. 7; 3.873 1.311 3.40 
5 D vs. 13 1.575 1.140 1.47 
7; vs. 7; 1.560 1.090 1.50 


1Since the weights are estimated, these variances are somewhat too small (Meier [1953]). 


|< 
| 
4 
a 
: 
— 
— 
| 


338 BIOMETRICS, SEPTEMBER 1958 


The interpretation of the magnitude of the estimated difference when 
calculated by any of these three procedures is not at all obvious. Unlike 4 
the first period only contrasts, we cannot easily specify a situation to 
which the estimated duration of relief applies. It is, of course, an : 
average over a number of different situations. Now if our object is 
primarily estimation, this may be a serious objection to such a procedure. 
However, if we are only concerned to determine the best drug and to 
make correct significance tests, the above intra-block procedure is 
valid. (The inter-block procedure is probably fairly safe also.) That 
is, under the null hypothesis that all treatments are alike, the probability 
of declaring a given pair of them to be not alike is correctly specified 
by the nominal significance level. 


4. LEAST SQUARES ANALYSIS OF SIMPLE ADDITIVE MODEL 


If the initial assumption of constant pain intensity had been valid, 
the most efficient estimate of drug differences would be that given by 
least squares. In fact, if there had been equal numbers of subjects for 
each order of presentation of drugs, the linear combination estimates 
would be the same as the least squares estimates. Because of the time 
trend, however, the least squares estimates are biased when, as here, 


TABLE 4a 
Least Squares INTRA-BLOCK ANALYSIS FOR DruGc DIFFERENCES 

Source S.s. M.s. 
Blocks 42 607 .256 
Drugs (adj. ) 2 100.288 
Intra-Block Residual 41 709.712 17.310 

Within Groups 37 428 .945 11.593 

Between Groups Residual 4 280.767 70.192 
Total 85 1,417.256 


Druc CoMPARISONS 


Comparison Mean Difference Variance t 
D vs. T; 3.150 1.174 2.91 
D vs. T; 2.013 1.061 1.95 


T, vs. T; 1.137 1.023 1.12 


§ 
i 
| 
‘ 


STUDIES OF PAIN RELIEF 339 


we have unequal numbers in the various categories. However, it may 
be of some interest to examine the least squares estimates in this case to 
see how different the results are from those of the linear combination 
estimates. The analysis for the intra-block, inter-block, and combined 
estimates is straightforward, and the results are shown in Tables 4a, 
4b, and 4c. Since the design was only slightly unbalanced the results 
are, as might be expected, only slightly different from those in Tables 
3a, 3b, and 8c. 


TABLE 4b 
Least SQUARES INTER-BLOCK ANALYSIS FOR DruG DIFFERENCES 
Source Dif. S.s. M.s. 
Drugs 2 154.848 
Blocks (adj.) 42 552.694 13.310 
Blocks Within Groups 37 534.580 14.448 
Blocks Between Groups (adj.) 5 18.114 3.623 


ComMPARISONS 


Comparison Mean Difference Variance t 

D vs. T; 3.900 3.720 2.02 
D vs. T; 1.167 4.213 .57 
T; vs. Ti 2.733 4.334 1.31 


In Table 4 we have separated out the 37 error degrees of freedom 
calculated within each dose-order grouping adjusted for treatments. 
In view of the time trend, it is not surprising to find that the between 
dose-order grouping contribution is rather large since it includes effects 
of drug differences as well as error. If this effect is ignored, however, 
and the pooled error term is used, the variance will be overestimated 
and we will be less able to discriminate between drugs of unequal 
effectiveness. 

When we turn to the combined estimates, we see that if the pooled 
error is used, the estimated variance component for block effects is 
negative since the inter-block error is smaller than the pooled intra-block 
error. In this case the usual practice is to ignore blocks and analyze 
the experiment as a completely randomized design. The pooled within 
and between block variance is not shown in our tables, but it is almost 


; 
| 
a 
— 
— 
J 
| 
: 
4 
4 
4 
| 


BIOMETRICS, SEPTEMBER 1958 


TABLE 4c 3 
CoMBINED INTRA- AND INTER-BLOCK ANALYSIS FOR DRUG DiIFFERENCES 


1 
= .08626 = = .06 
w 11.593 08626 w 14.448 06921 
Calculation of Average Mean Difference: Drug D vs. Drug 7; 
)(3.150 
(w)(3.150) + (w )(3-900) _ 9 99) 
(w+w) 
Calculation of Variance of Difference } 
= = 6.435 
(w+w’)  .08626 + .06921 
ComMPaRIsONS 
| Comparison Mean Difference | Variance! t 
D vs. T; 3.321 3.51 
D vs. Ts 1.879 .849 2.04 
T; vs. T; 1.442 .830 1.58 
w = 1/11.593 and w’ = 1/14.448 
4 1Since the weights are estimated, these variances are somewhat too small (Meier [1953]). 
eer twice the correct within patient variance. The calculations shown in 
4 Tables 4a and 4c are based upon the error estimated from the 37 within 
i dose-order degrees of freedom only. 4 
DISCUSSION 
od. The experiment described and analyzed above was designed in ‘ 
4 accordance with the widely accepted principle in studies of pain relief 4 
re that good precision for estimates of drug differences requires that com- 


parisons be made within patients rather than between patients. The 
results in this experiment, however, make it quite clear that a simpler ; 
design, using only one drug for each patient, would have provided ; 
better discrimination between drugs than did the more complex in- 
complete block design. In fact, the values of ¢ obtained using only the 
first dose measurements were larger than those in any of the more 
complex analyses. The incomplete block designs will have an advantage 
when the variation between individuals is large compared to that within 
individuals but, because of the reduced number of patients per com- 


340 


STUDIES OF PAIN RELIEF 341 
parison, there will be a loss in precision when the variation between 
individuals is fairly small. In this case post-operative pain appears to 
decrease at a fairly rapid rate, so that the variation in duration of 
relief experienced by a patient at different times after the operation 
was of the same order as the variation between patients tested at the 
same time. (It should be noted that if we had but two drugs to compare, 
we would not suffer a loss in the number of patients per comparison 
and the within patient analysis would show up in a more favorable 
light.) 

When drugs are administered in sequence with timing controlled by 
patient demand, as was the case here, the duration of relief experienced 
with the second drug will depend on the effectiveness of the first drug 
because of the continuous decrease in pain with time after operation. 
However, there seems to be no administratively feasible procedure for 
fixing the time at which the second dose will be given except to supply 
it upon demand. The consequence of this requirement is that within 
patient comparisons between drugs will tend to reduce the apparent 
differences between good and poor drugs. 

One possibility which might in part ameliorate this difficulty is to 
use a transformation of the time scale. Thus, two hours of relief immedi- 
ately after the operation might be the equivalent of six or eight hours 
of relief half a day later. Unfortunately, comparisons in scaled units 
are not too easy to interpret. Since the proper transformation is likely 
to be dependent on the details of each individual experiment, we did 
not investigate this possibility in detail. Another possibility would be 
to generalize the model for drug effects so as to fit the facts more closely 
and to include terms related to times of administration and their 
interactions with the several treatments. The analysis of such a 
model would be a fairly complex undertaking and, although it might 
point to improvements in experimental designs, this type of analysis 
does not seem well suited for routine comparison of drugs. 

If one turns to a design using only one drug per patient, it may be 
possible to increase precision by a number of devices. If the patient is 
supplied with the single drug on a demand schedule, some other measure 
such as the number of doses demanded in 24 hours may be found to 
give more precise comparisons. Alternatively, if the experimental 
design requires control over the patient for only the first one or two 
administrations of drug, it may be possible to include a larger group of 
patients in the study. In addition, such a plan should reduce the 
frequency of loss of subjects on account of failure to complete the 
treatment schedule. (In this study 16 patients were discarded because 
less than four ampules were required.) 


aks 
| 
| 
| 
| 
| 
| 
a 
: 
1 


342 BIOMETRICS, SEPTEMBER 1958 


The results in this study should not necessarily be assumed to apply 
in all cases. We deliberately chose the total duration of relief for two 
administrations of each drug as our measure in order to minimize carry- 
over effects (Jackson [1956]). In a study with a single administration 
per drug it may be possible to give all treatments to each patient and 
thus avoid the loss of information inherent in incomplete block designs. 
However this design has the potentially serious disadvantage of in- 
creasing the effects of drug interactions. In any case, a proper analysis 
allowing for the effects of order of administration would be more complex 
than most researchers would care to consider. When all is said and 
done, the gain in discrimination (beyond first period simple comparisons) 
which might be attained by using within patient contrasts, even with 
the addition of inter-patient information, seems hardly worth the 
additional complexity in analysis and in the interpretation of the esti- 
mated differences. 

Of course, we do not have evidence on which to question the validity 
of within patient comparisons for types of pain which we have not 
studied. Also, we must recognize the possibility that our subjects are 
more homogenous than is the case in most studies. In other cases 
the within patient contrasts may yield substantial gains, or it may be 
found that it is advisable to group subjects into homogenous subsets 
before assigning them to treatment categories. 

The point to be emphasized is that in studies of pain relief, no general 
claim as to the merit of within individual comparisons versus between 
individual comparisons can claim validity in all circumstances. The 
issue is an empirical one, and only a series of properly designed experi- 
ments can make clear the situations in which one or the other technique 
will be of most benefit. 


REFERENCES 


Beecher, H. K. [1957]. The measurement of pain. Prototype for the quantitative 
study of subjective responses. Pharmacology Reviews 9:59. 

Cochran, William G. and Cox, Gertrude M. [1957]. Experimental Designs, 2nd ed., 
John Wiley & Sons, Inc., New York. 

Jackson, G. L. and Smith, David A. [1956]. Analgesic properties of mixtures of 
Chlorpromazine with Morphine and Meperidine. Annals Int. Med. 45:640. 
Keats, Arthur S. [1957]. Postoperative pain: research and treatment. J. Chronic 

Diseases. In press. 
Meier, Paul [1953]. Variance of a weighted mean. Biometrics 9:59. 


f 
a 
a 
| ‘ 
4 
j 
5 
‘ 
& 
‘ 
‘ 
3 
a | ( 


INTERACTION OF GENOTYPE AND ENVIRONMENT IN 
CONTINUOUS VARIATION: I. DESCRIPTION 


KENNETH MATHER AND R. Morey JONES 


Unit of Biometrical Genetics, Agricultural Research Council, 
Department of Genetics, University of Birmingham 
Birmingham, England 


Genes can take part in four types of interaction in producing their 
effects. First, allelic genes may interact to give dominance, or even 
over-dominance where the heterozygote transcends in its effects the 
range of expression delimited by homozygotes. Second, non-allelic 
genes may interact in a variety of ways which have received various 
names in classical genetics, but which are commonly pooled under the 
comprehensive heading of epistasis in the study of continuous variation. 
Third, genes may interact with permanent or semi-permanent com- 
ponents of the cytoplasm. And fourth, they may interact with agencies 
external to the organism and hence referable to the environment. 

The distinctions among these classes of interactions may not always 
be final. Although persistent, components of the cytoplasm may 
themselves be products of earlier gene action, even the products of 
action of genes no longer present in the nucleus with which the cyto- 
plasm is associated. Equally, an organism may characteristically alter 
its environment and in this way express the earlier action of genes. 
To the extent that this is the case, some cytoplasmic and environmental 
interactions may ultimately be referrable to wholly genic interactions; 
but even so, they may be still more conveniently handled in their 
separate classes. In such a matter experience must be the guide. 

Provision has been made for the part played by dominance, or 
allelic interaction, right from the first in biometrical genetics [Fisher 
1918]: indeed the method of describing, and hence of measuring the 
contribution that dominance makes to the variation, has varied but 
slightly from the pattern Fisher originally set. Fisher considered 
epistasis too, but the description and measurement of these non-allelic 
interactions have posed more difficult problems. A general method of 
describing and classifying them has, however, now been devised [Cocker- 


343 


| 
| 

4 

x 

f 

4 

q 


344 BIOMETRICS, SEPTEMBER 1958 


ham 1954, Anderson and Kempthorne 1954, Hayman and Mather 1 
though admittedly the experimental problems of their measurer 
have still to be tackled successfully. By contrast the theory of ct 
plasmic and environmental interactions has received but scant attenti 
There has indeed been little to suggest that the interaction betw 
nuclear genes and permanent components of the cytoplasm is of gr 
moment in continuous variation, but the growing mass of evidence 
interaction between the genotype and environment [Mather 195 
lends urgency to the need for its closer consideration. 


TABLE 1. 
Tue Four PHENotyPes oF Two GenotyPes IN Two ENVIRONMENTS EXPRESSED 
tn TERMS OF PARAMETERS d, , €; , AND g; , REPRESENTING GENETIC, ENVIRON- 
MENTAL, AND INTERACTIVE EFFECTs. 


Genotype 
AA aa Mean 
Environment X —data—-H 
Y dg — fi 
Mean da —d, 0 


Genotype-environment interactions in the simplest case 


The simplest case to consider is that of two true-breeding lines, 
differing by a single gene substitution, in two environments. Four 
situations are then possible as shown in Table 1, and three parameters 
are necessary to describe completely the differences among the four 
phenotypes. One of these parameters can be the familiar genetic 
quantity d, , by which the phenotype of AA when averaged over — 
environments exceeds and that of aa similarly falls short of the overall _ 
average value taken as 0. A similar parameter e, may be used to 
represent the differences between the two environments, being the 
amount by which the phenotype in environment X when averaged over © 
genotypes exceeds and that in environment Y similarly falls short, 
of the overall mean. The two comparisons represented by d, and e, 
are orthogonal to one another when the four situations are equally 
common, as we have assumed them to be. The third comparison will 
measure the statistical interaction of the genetical and environmental 
components represented by d, and e, . It may be represented by the 
parameter g, , taken as the amount added by the interaction to the 
phenotypes of AA in environment X and aa in environment Y, and 


1; 
it 
, 
| 
| 


INTERACTION OF GENOTYPE AND ENVIRONMENT 345 


deducted from the phenotypes of 4A in Y and aa in X. The interaction 
comparison is then orthogonal to the other two, and between them d, 
and e, and g, will completely describe the relations among the four 
phenotypes. It will be observed that g, representing the genotype- 
environment interaction, bears the same relation to d, and e, as 7,5; , 
» and , representing digenic interactions, do to the main 
genic effects d, , d, , ha , and h, [Hayman and Mather 1955}. 

When all four phenotypes are different, six qualitatively distinct 
types of relation can exist among them. ‘These have been listed by 
Haldane [1946] and are set out in Table 2, where 1, 2, 3, and 4 refer to 
the serial order into which the phenotypes fall in respect of the character 
by which they are measured. Thus if 1 represents the largest, healthiest, 
fittest, or what else it may be, phenotype; 2 is the phenotype next is size, 
health, or fitness, ete., 3 the next; and 4 the smallest, least healthy, 
or least fit, etc., of all. Then where 1—4 represents the difference between 
phenotypes 1 and 4, and so on, it is clear that 1-4 > 1-3 > 1-2, and 
equally 1-4 > 2-4 > 3-4; though the relative magnitudes of 1-2, 2-3 
and 3-4 cannot be specified. It is not difficult to see that this allows 
us to specify the six types of relation by the relative magnitudes of 
d, , €; and g, . Thus Haldane’s relation la (see Table 2) gives 1-4 = 
2(d, + e,), 1-3 = 2(d, + gi), and 1-2 = 2(e, + g,) so that 1-4 > 1-3 > 
1-2 is satisfied if, and only if, d, > e, > g, . The specifications of all 
six types of relation in terms of d, , e, , and g, are shown in Table 2. 

With the four phenotypes all different, specification is by inequalities; 
but, if two or more phenotypes are alike, equalities must be introduced. 
Thus for a relation basically of type 1b the gene difference is effective 
in one environment but not in the other, so that phenotypes 1 and 2 
are equal, the relation being specified by adding the condition d, = —g,. 
Where phenotypes 1, 2, and 3 are all alike, as might be the case if 
for example the gene governed susceptibility or resistance to an infective 
disease and the environments differed in infectivity, the condition 
becomes d, = e, = —g, , and specification is complete. 

Turning back to the more general case where all phenotypes are 
different, it is easy to show that where the differences between genotypes 
in both environments is small as compared with the differences between 
environments, d, and g, must both be smaller than e, . As Haldane 
has pointed out, the relation must then be of type 1b or 2, according to 
whether d, exceeds or falls short of g, . Similarly in the reverse case 
where the environmental difference is smaller than the genetic, the 
relation must be of type la or 3, since d, must be larger than both e, 
and g,. Haldane further concluded that, when both the environmental 
and the genetic differences are small, the relation can only be of type 1, 


‘ 
| 
| 
| 
Bi 
| 
| 
| 
| 
4 
| 
: 
al 
; 
1e 
ne | 
id a 
: 


346 BIOMETRICS, SEPTEMBER 1958 


TABLE 2 
Haupane's Six ReLations oF Four DirreRENT PHENOTYPES GIVEN By Two 
GENOTYPES IN Two ENVIRONMENTS, SHOWING THEIR DEFINITIONS By THE 
RELATIVE MAGNITUDE OF AND . 


Relations | Environment Genotype Definition 
AA aa 

| 3 

la 
| Y 2 4 
x 1 2 

1b | a>da>n 
} 3 4 

X 1 2 

2 a > fi > da 
Y 4 3 
X 1 4 

3 > > & 
Y 2 3 
X 1 4 

4a fi > a. > e 
Y 3 2 
x 1 3 

4b fi > ey > da 
y 4 2 


The numbers 1, 2, 3 and 4 denote the order of the phenotypes in respect of magnitude of expression. 


but this apparently simple extrapolation is unjustified. It fails because, 
instead of specifying the magnitude of the parameters relative to one 
another as before, it merely states that all are small in an absolute 
sense, presumably by reference to some external consideration. In- 
ternally, therefore, their relative magnitudes remain undefined, so 
that all six types of relation are possible among the phenotypes, and this 
is true as much when all phenotypic differences are “small’’ as when 
they are “large.” 

The type 4 relations require that g, is larger than both d, and e, . 
Thus specific adaptation of alternative genotypes to alternative environ- 
ments not merely can spring from, but in fact requires, a genotype- 
environment interaction larger than the genetic and environmental 


| 
( 
4 


INTERACTION OF GENOTYPE AND ENVIRONMENT 347 
effects as defined in our present way. Any move from other relations 
to those of type 4 must, therefore, depend on a rise in relative magnitude 
of the genotype-environment. interaction. 

This brings us to the last point which must be made about this 
simple set of relations. It has been discussed for the sake of convenience 
in terms of two homozygous genotypes differing only by a single gene 
substitution. It can obviously be extended to any pair of genotypes, 
or even pair of mixtures of genotypes, provided the mixtures are constant 
over environments, by substituting for d, a genetic measure appro- 
priately compounded of the d, h, i, j, and / components of variation 
{Hayman and Mather 1955] relating to the genes in which the genotype, 
or mixtures of genotypes, constantly differ. The parameter measuring 
genotype-environment interactions will then become correspondingly 
compound, as will be shown below. These compound parameters will, 
however, differ from d, and g, , which they replace, in that their magni- 
tudes, relative both to one another and to e, , will be capable of adjust- 
ment each by its own internal compensation, that is to say, by genetic 
reshuffling of the contrasting genotypes. In broad, this is to say no more 
than that, as has long been obvious, the rise of specific adaptations must 
depend on the emergence and fixation by selection of appropriate 
genotypes; but in so far as we can now relate it to the parameters 
describing genic actions and their interactions with the effects of environ- 


ment we can proceed to a detailed theoretical investigation of the 
problem. 


Description of interactions in the general case 


Extending considerations to include the heterozygote Aa as well 
as the two homozygotes, AA and aa, in two environments, six situations 
are possible and five parameters will be necessary to specify completely 
the differences among the six phenotypes. Three of these, d, , e, , 
and g, , we already have. A fourth, h, , is provided by the customary 
method of expressing the effect of the heterozygous genotype in bio- 
metrical terms, and a corresponding parameter representing the inter- 
action of the h, and e, comparisons can be brought in to complete the 
tally. This will be designated as g,, , and the earlier g, will now be 
written as gz. to emphasise its corresponding relation tod. The pheno- 
typic comparison of heterozygote with the mean of the two homozygotes, 
which h, represents, is independent of d, measuring the phenotypic 
difference between the homozygotes. Thus g,, will also be independent 
of gza , and the description is complete as shown in Table 3. 

The system is readily extended to cover cases of more than one 
gene substitution. Each gene difference that is added brings in a further 


| 
| 
| 
| 
E 
; 
| 
| 
| 3 
| 
| 
Bi 
cat 
ae 
i 
| <1) 
ALY: 
3 
é 


348 BIOMETRICS, SEPTEMBER 1958 


TABLE 3 


Tue Six Puenotypes From tue THREE GENotypic COMBINATIONS OF ALLELES 
A a1n Two ENVIRONMENTS. 


Genotype PF; 


AA Aa aa | Mean 


x da +et+ Jada ha +eat+ Jha —da +a Jada a + ha + 39ha 
Environment 


Mean da ha the 


d and a further h. A further g, and a further g, can be introduced to 
accommodate their interactions with the environment. The various 
d’s and h’s are distinguished by subscripts as d, , d, , ha , hy , etc., denoting 
the gene differences to which they attach, and the g’s can be similarly 
distinguished as gaa , Jas » ANG Soon. The g’s will be as inde- 
pendent of one another in their contributions to the variation as are 
the various d’s and h’s. In the same way, the parameters 7, j and lI, 
representing non-allelic interactions among the genes, can be used to 
generate g; , g; , representing the compound interactions of these genic 
interactions with the environmental difference. The system can be 
developed as far as is desired, every genetic component of variation 
being matched by a corresponding g. Each g will be independent of its 
parent genetical parameter in its contribution to the variation, and the 
various g’s will be expected to match the wholly genetical parameters 
in their relations to one another. 

Turning to the case of more than two environments, it may be 
noted that no matter how numerous they may be, their differences are 
expressable by a series of orthogonal comparisons equal in number to the 
degrees of freedom between them. With two environments there is one 
such comparison, represented in our simple case by e, . With three 
environments there are two comparisons which can be represented by 
parameters denoted as e, and e, . Four environments yield three 
comparisons so that e, , €. , and e; will be required, and so on. The 
average phenotypes of the different environments can be combined in 
a variety of ways to give appropriate sets of orthogonal comparisons 
[see Mather 1943, Chapter 6], but unless there is a relation of special 
interest to us among the environments we may choose whatever set of 
comparisons happens to be most convenient for representing the environ- 
mental differences. So long as the member comparisons of a set are 


J 


INTERACTION OF GENOTYPE AND ENVIRONMENT 349 
orthogonal to one another, the sum of their squares will necessarily be 
equal to the sum of squares of deviations of the environmental means 
from the general mean. As a group therefore, they will represent the 
environmental component of variation. 

As the number of e’s is increased to accommodate more environments, 
so the number of g’s will grow too. Thus, considering the case of the 
three genotypes AA, Aa, and aa in the three environments, we shall 
have the two genetical parameters, d, and h, , and two environmental 
parameters e, and e,. There will then be four g’s, namely gaa: and gza2 
representing the interactions of the component of heritable variation 
with the two environmental comparisons, and g,.; and gya2 representing 
the corresponding interactions of the dominance component of variation. 
With the same three genotypes in four environments, eleven parameters 
are needed for specification of the differences among the twelve pheno- 
types. One way of assigning them is shown in Table 4; but it should 
be remembered that while the two parameters d, and h, are fixed by the 
requirements of meaningful genetical analysis, the three e comparisons 


TABLE 4 


Tue TWELVE PHENOTYPES FROM THE THREE GENOTYPIC COMBINATIONS OF ALLELES 
A AND @ IN Four ENVIRONMENTS. 


Genotype 
Environ- 
ment AA Aa aa Mean 
W dd+ateates A+tatete |\-d+atertes 
+ gai + gaz + gas) + gm + gaz + gras} — gar — gaz — gas} The mean 
d+ea—-@—e3 — & — environ- 
+ gai — Jaz — gas) + gm — Gro — Gras} — Gai t+ Gaz + gas} ment can 
— be repre- 
|h-e +e—e3; |—-d—e +e — €3 sented as 
— ga + gaz — gas} — gr + gre — gas} + gar — gar + gas) e+ 
(see below) 
— Jai — Jaz + gas} — Gm -- Gro + gras} + gai + Gas — Gas 
Mean d h —d th 


3 
Each tie can be written as (d; h; —d) +e + (ga; gs; —ga) where e = > (e), ya 


z (94,) and gs = y (9,,), all summations taking sign into account, and ya having the sign 


the phenotype AA. 
Note: the subscript a, as used in Table 3 to indicate that the d, h, gu’s, and ya’s are those attaching 
to the gene pair A-a, have been omitted for brevity. 


1 
appropriate 


4 
4 
= 
| | 
| 
4 
2 at 


350 BIOMETRICS, SEPTEMBER 1958 


and with them the six g's might. be assigned in a great variety of ways 
all equally appropriate for genetical purposes. This will always be so, 
unless there is among the environments some special relation which 
would favour the use of a particular set of e and g comparisons. The 
e comparisons in Table 4 are thus to be regarded as but one of the 
many possible partitions of the environmental differences. The g 
comparisons will of course always follow from the particular partitions 
used for the e’s. 


Contribution of interaction to variation 


Where the chances of occurrence of the various genotypes are the 
same in all environments the overall mean phenotype will be independent 
of the genotype-environmental interaction as represented by the g items; 
though, of course, its sampling variation will reflect the magnitudes and 
relations of the g’s. The effects of the interaction on the phenotypic 
variances can be illustrated from the case of a single gene difference in 
four environments as set out in Table 4. 

The variance of a line pure for the AA genotype over the four 
environments can easily be shown to be (e; + gai)” + (e2 + gaz)? + 
(es + gas)” when taken about its own mean of d. Taken about the 
mid-parent value of 0, this variance is of course increased by d’. The 
variance of an aa line round its own mean of —d is similarly (e, — gi)” + 
(€2 — gar)” + (es — gas)*, again with d® added when taken round the 
mid-parent value. In the same way the variance of the heterozygote 
Aa is (€; + ga)” + (€2 + gaz)” + (es + gas)” round its mean of h, with 
h? added when taken round the mid-parent value. 

The variance of any population including all these genotypes can be 
found from these results. Thus in the F, of the cross AA X aa the three 
genotypes occur in the proportions }: }: } and, since (e, + gi)? + 
(e: — ga)” = e? + g3, etc., the variance round the mid-parent becomes 
+4 (e:)° +3 + (ga)” + $h° +3 + gm)’, where = 
ei + e} + e , etc. The overall phenotypic mean is 3h so that the 
variance of F, round its own mean, obtained by subtracting (4h)’, 
becomes 


Veo = 40° +34 +3 Da)? + WW +4 Dt gn) 
= 3d” } (ga:)” + 1)? + } + > + 


Now if we denote the environmental and interactive increments to 
the phenotype as e, g, , and g, respectively, so that with respect to any 
phenotype e is the sum of e, , e, , and e, taking sign into account, and 
gz and g, are similarly the algebraic sums of g,, , etc. and g,; , ete., it 


i 
| 


| 


INTERACTION OF GENOTYPE AND ENVIRONMENT 351 
can be shown that 


V, being the variance of e and so on, 
Then 


and 


V = 4d? + + ih? + + 


In this form the environmental and interactive components are 
independent of the particular breakdown of the e, g, , and g, comparisons. 
The expression is thus not limited in its applicability to the case where 
comparisons among the four environments take the form of Table 4. 
Indeed, in this form the expression applies to any number of environ- 
ments. 

The interaction components, g; and g, , have means of 0 over all 
environments, but the heritable variation measured within environments 
includes gz and g, as well as d and h. The variances, V,, and V,, , 
may, therefore, also be regarded as the variances of d and h, as measured 
in each environment, round their overall means which, being averages 
over all environments, are free from g, and g, . 

If the environments are distinguishable, as for example where each 
is a block in a replicated experiment, the mean phenotype of each 
environment or block is given by 3h + e + 3g, , so that the variance 
of environmental or block means round the grand mean becomes 
Vce+1/2e,) » Which is the last term in Vy, . The effect of the hetero- 
zygote’s interaction is therefore expressed partly by inflation of the 
estimate of environmental variation. The remainder of the effect of 
the heterozygote’s interaction and the whole of the homozygote’s 
interaction remains with the true genetical variation after deduction 
from Vy. of the environmental variation as measured by differences 
among the environmental means. The genetic and interactive com- 
ponents of variation remaining with them could, of course, be separated 
if the individuals of the different genotypes were regularly identifiable, 
but in continuous variation this is not generally possible. In general, 
therefore, the heritable variation as measured by a simple analysis of 
variance will be inflated by such genotype-environment interactions 
as may exist. 

The amount of interactive variation confounded with the environ- 
mental differences varies with the constitution of the mixture of geno- 
types. Thus, if instead of an F, we have an F, population, the overall 


proportions of the three genotypes will be 2:3}: 3. When such a 


4 
= 
ri 
| 


352 BIOMETRICS, SEPTEMBER 1958 


population is distributed at random over the environments the total 
variation gan be shown, by methods similar to those used above, to be 
Ves = 3d" + 3V,, + + + This can, of course, 
be broken down! into two parts, the variance of family means (V;-3) 
and the mean Variance of F, families (V2r3), where each family is 
descended by self-fertilisation from a single F, individual. Then 
Ves = Virs + Vors . The heritable variation is separable [Mather 
1949a and b] into = 3d” + and = 1d” + 4h’; but the 
associated interactive variation does not necessarily show a correspond- 
ing separation. The precise partition among the two variances will in 
fact depend on the structure of the population and the design of the 
experiment in which the variation is measured. With each F; family 
consisting of a large number of individuals, and the families distributed 
equally over environments, we should find that V,,; tended to 3d? + jgh? 
while tended to }d° + 3V,, + + + , the 
whole of the interactive variation tending to appear in the mean variance 
of families. With smaller families, sampling variation would lead to 
part of the interactive component appearing in Vy; . Again, if the 
member individuals of each family were kept together within the 
experimental design and not distributed randomly over the environ- 
ments, Vir; must be inflated not merely by sampling variation, but also 
by the environmental and associated interactive differences between 
their different sites or plots. These must commonly be larger than the 
environmental variation within the different sites or plots, upon which 
will depend the e and g components of Voz; . 

Despite this difficulty in partitioning the interactive component of 
variation among the variances within and between the families and 
groups distinguished by their ancestry within the population, the total 
interactive variation component is fixed and would appear to follow 
the same rules in drawing up the balance sheet of variation as do the 
components depending on d and h. [See Mather 1949b.] Thus in 
Vr. the coefficient of d* is } and that of h’ is}. At the same time the 
coefficient of h in the F, mean is 3. Squaring this last coefficient, so 
that it can be combined with the coefficients of the two quadratic 
components, and summing the three of them gives } + } + (3)? = 1. 
In the F; generation the mean is 3h and the coefficients of d’ and h? in 
Vers (= Virs + Vor3) are 2 and ;% respectively. The quadratic total 
is thus 3 + 3% + (4)? = 1. We thus see that d’ and h? components of 
variation and the departure of the generation mean from the mid-parent 
(which is used as the origin in measurements of the phenotype) are 
alternative forms of expression of the genetical variation innate in the 
cross. We may alter the distribution of the variation among these 
categories, but the total amount to be distributed will be constant. 


— 75 


a 
oe } 
{ 
1 
1 
| 


INTERACTION OF GENOTYPE AND ENVIRONMENT 353 


The same is true of the interactive variation. The coefficient of 
V,, is the same as that of d’, and the coefficient of V,, like that of h®. The 
counterpart to the coefficient of h in the mean is the coefficient of g, in 
that part of the interactive variation which is confounded with e. 
In Vp. we have 4V,, + 4V,, + Vee+1/20,) SO that the sum of quadratic 
coefficients is } + } + (3)? = 1. In V,,; the corresponding items are 
+ + giving + 3% + = 1. The contribu- 
tions to mean and total variance of F, are thus expected to be Fy = 
gh, Vey + Vo + + + V ce+1/800) though the par- 
titioning of V,, and V,.41/s,,) among the three variances, Vir, , Vors , 
and V3», defined by Mather [1949b] and Mather and Vines [1952], 
will depend on the structure of the experiment. These results are 
collected together in Table 5. 


TABLE 5 
BALANCE SHEET OF GENETICAL AND INTERACTIVE VARIABILITY 


Genetical Interactive 
Variability Mean Variability in E 
Generation D H (h)| Total Gp Guy (gx)| Total 
F, 1? 1 — 1? 1 
F, 1/2 1/4 (1/2) 1 1/2 1/4 (1/2) 1 
Fs 3/4 3/16 (1/4) 1 3/4 3/16 (1/4) 1 
F, 7/8 7/64 (1/8) 1 7/8 7/64 (1/8) 1 
F; 15/16 15/256 (1/16)? 1 15/16 15/256 (1/16)? 1 


When drawing up the balance sheet the coefficients of > (h) in the departure of the generation 
mean from the mid-parent, and of } (gk) in its contribution to the E item of variation, are squared 
to make them summable with the coefficients of the quadratic quantities D, H, Gp , and Gy. See 
the text for the definitions of these quantities. 


One further point is perhaps worthy of special note. The non- 
heritable variation as measured by differences between the environ- 
mental means is V¢.41/29,) IN F2 , Veestyao,) in F3 and so on. The 
proportion of g, confounded with e is changing with the generation and 
becoming smaller as inbreeding proceeds. The genotype-environment 
interaction is thus producing the effect of a fall in the environmental 
variation as measured in this way from generation to generation. 

Little need be said about covariances of F, and F; , F; and F, , 
and so on. Where, as in a properly designed experiment, parents and 
offspring are distributed independently of one another among the 
environments so that their environmental circumstances are uncorre- 
lated, the covariances will contain terms in d* and h* only: neither e 


= 
#4 
A 
| 


304 BIOMETRICS, SEPTEMBER. 1958 


nor gz and g, will contribute to them. The expressions for these covari- 
ances will thus be as given by Mather [1949b]. With interactions, 
however, the d’s and h’s must be expected to change from season to 
season, so that where the different generations are raised at different 
times, the values of the components of variation in the covariances 
cannot be assumed to be the same as their counterparts upon which 
the variances within a generation will depend. 


Rundomly breeding populations 


Where the alleles A and a are equally common, no portion of gg 
becomes confounded with e: the g, of AA and aa homozygotes balances 
out in each generation. This will not, however, be the case where the 
gene frequencies are not equal. Let wu be the frequency of allele A and 
v (= 1 — u) that of a. With random mating the frequencies of the 
three genotypes become u” AA : 2uv Aa:v” aa, so that the overall 
mean phenotype is (u — v)d + 2uvh. It can then be shown by the 
methods used above that the phenotypic variance is 


De + ga) + ga)’ 
+ + + gu)?] — [Cu — v)d + 
+ hv — + 2w [ga + gue — wl 


+ 4uv?h? + (gis) + fer + — + 
which may be written as 


V = 2u[d + hv — + 2WV + 
+ 4u’v’V,, + 


As before, each term in d and h is matched by a corresponding 
term in g, and g, ; and, again as earlier, the environmental variance has 
confounded with it a part of the g, effect. Now, however, a part of g, , 
the environmental interaction of homozygotes, is also confounded with e. 
This reflects, of course, the appearance of g, as well as g, items in the 
environmental means, and indeed V(.4 ¢u—»)9¢+2us0a) 18 again the non- 
heritable variance as measured by differences among the environmental 
means. 

Where their environments are independent of one another, the 
covariance of parent and offspring in respect of the gene under considera- 
tion (see Mather 1949a) will be Wp,» = ud + h(v — u)]’. But where 
the environments of parent and offspring are completely correlated, in the 
sense that the offspring fall into the same environment as the parent, 
it is not difficult to show by an adaptation of Mather’s [1949a] methods 


| | 
| 
4 


INTERACTION OF GENOTYPE AND ENVIRONMENT 355 


that W evo + h(v + WV (94+ + Ves 
The correlation of environments thus reintroduces not merely the last 
term, measuring differences between the environmental means, but also 
the term in g, and g, corresponding to the main genetical term in d and h. 

The sib covariance in respect of this gene is similarly Ws,;s = 
wid + h(v — u)]? + u’v*h? where the sibs’ environments are independent 
of one another, and. 


= uv[d + hw wh? uv’ Vo, 


+ 


where the environments of the sibs are always alike. 

Where, therefore, analysis is made from the data supplied by correla- 
tions between relatives in independent environments [see Mather 
1949a] the presence of genotype-environment interactions will lead to 
underestimation of the genetical components of variation, unless they 
are specifically accommodated in the calculations, for every item in d 
and h will be accompanied by a corresponding term in g, or g, in the 
variances forming the denominators of the correlation coefficients but 
not in the covariances which form the numerators. . With fully correlated 
environments, on the other hand, the genetical components will be 
overestimated unless it is recognized that the covariances contain the 
same items in e, gz, , and g, as the variances. The estimates of the 
contributions made by d and h to the variation will also be increased 
because of their always being accompanied by the corresponding terms 
in g,; and g,. The resulting distortion may not, or course, seriously 
affect the estimated values of the d and h components relative to one 
another. Distortion relative to the non-heritable variation will be 
reduced by the appearance of gz and g, with e in this component of 
variation, but further investigation will be necessary before its full 
effects can be assessed with any accuracy. 

Partial dependence of the environment of offspring on that of parents, 
or that of sibs on one another, would be expected to have effects between 
those of the two limiting cases we have considered. This case of partially 
correlated environments also requires further investigation. 


More than one gene 


Where genes are independent both in their actions (i.e. show no 
non-allelic interaction) and in their distributions (i.e. show no effect of 
linkage), the contributions they make to the components of variation 
will be simply additive. The family means, variances, and covariances 
are thus obtained by summing appropriately over all genes. The 


f 
a 
b 
| 
| 
| 
l 
i 
| 
’ 


356 BIOMETRICS, SEPTEMBER 1958 


variance of F, , for example, becomes Vp. = 4D + 3G, + 3H + 
+ Er, where D = (d?), H = (hi), Go = and 
Gy = > (V,,.). E requires special note. In the other four cases 
summation is of the quadratic quantities themselves. EF, however, is 
the variance of environmental means and so will be given by 
Vi (one)! ° 

The overall variance of F; will similarly be Vp; = 23D + 2Gp + 
+ + Eps and Eps = 18 not the same as . 
The covariance of F, parent and F; family mean, assuming uncorrelated 
environments, will be simply W,r2; = 3D + 3H, where now D = 7. dd’ 
and H = >> hh’, the primes indicating the value of d or h in the parental 
environment. 

The variance of a randomly breeding population will be 


where 
D = > {4u,v.[d. + ual}, 
H = (16u22h2), 
Gua = 

and 
E 

With uncorrelated environments, Wpyo = 3D and Ws;s = 


1D + j<H, but when the environments are fully correlated 


Wey =iD+iGot+E and Ws,s iGo+t + + E, 


as we have already seen. According to which is the situation, Gp will 
always be included in a common estimate with D and Gy with H or 
both will be included with EZ, unless there exists some means of varying 
the environmental correlation. 

Where the non-allelic genes do not act independently in producing 
their effects, the situation can be described by introducing parameters 
i, j, and J to take account of the interactions [Hayman and Mather 
1955]. The means of true-breeding parent lines and F, contain items 
in 2, j, and J, but the mean of F, does not. The variance of F, , however, 
will contain new terms in >> (37), >> (?) in addition to D and H. 
Now, just as g, items can be introduced to describe and accommodate 
the genotype-environment interaction corresponding to d and h, so 
9: » 9; , and g, items can be brought in to cover the interactions with 
environment corresponding to 7, j, and /. Furthermore, these will be 


| 


INTERACTION OF GENOTYPE AND ENVIRONMENT 357 


as independent of 7, j, and las g, and g, are of d and h, and as independent 
of one another as are 7, j, and 1. Thus in F, we should expect the co- 
variance to include terms in G; = >> (V,,), Gy = >> (V,,), and G, = 
> (V,,) with coefficients like those of J = > @), J = > (/), and 
L = > (’). Equally when the j items become partly confounded 
with the d’s, and the 7’s with the h’s, in the variance of F; , we should 
expect similar confounding of g; with g, and g; with g, . Detailed 
investigation of these relations remains a task for the future; but if 
these expectations are valid, it would appear that the genotype-en- 
vironment interactions should not in principle vitiate the basis of 
recognition of interactions among the non-allelic genes themselves. 

Turning next to the effects of linkage, it appears that the variance 
of an F, segregating for the two genes A — a and B — Bb, linked with 
a recombination value of p, is given by 


Veo = 3[d? + dj + 2(1 — 2p)d,d,] 

+ 31V + 2(1 2p) W 

+ + he + 201 — 

the terms in d,d, and W,,,,,,, being positive for the coupling and negative 
for the repulsion phases of linkage. 

Again it will be seen that the interaction terms in g, and g, match 
in structure and coefficients the corresponding terms in d and h, and 
that the final term includes g, , as well as e in the way already noted 
in the absence of linkage. Indeed this last term must be independent 
of the linkage relation as it reflects differences among the environmental 
means which are themselves independent of the linkage relation between 
the genes. 

The overall variance of the /; generation is found as 
Ves = 3[d? + d} + 2(1 — 2p)d,d,] 

+ + 2(1 2p) W 

+ + d; + 2(1 — 2p)*d.dy] 

+ + + 2(1 — 2p)*h,h,] 

+ wiV... + + 2(1 = excess] 

+ + hi + 21 — — 2p + 2p*)hahy] 

+ + 2(1 2p)*(1 2p + 27°) W 
+ 


abe 
| 
“4 
| 


358 BLOMETRICS, SEPTEMBER 1958 


The final term is of course that already found for V3 in the absence of 
linkage. The terms in d, , d, , h, , and h, separate as shown into the parts 
established by Mather [1949a and b] for V,;-3 and V2r3 where linkage is 
operative. The terms in g, and g, are composed correspondingly but will 
partition between Vr; and V2; according to the structure of the 
experiment as we saw for the case where linkage is absent. 

These formulae for Vy. and V,;, can be generalised in terms of 
D, H, Gp , and Gy and £; but it should be noted that the definitions of all 
these quantities will change with the generation and in particular with 
the rank of the variance. As in earlier cases, the covariance of F, parent 
and F; family mean will be independent of gz , g, , and e provided that 
the distributions of parents and progeny among the environments are 
uncorrelated. 


Summary 


The differences among the four phenotypes given by two genotypes 
in each of two environments can be described in terms of three param- 
eters, d, measuring the average effect of the genic difference, e, measuring 
the average effect of difference in environment, and g, measuring the 
interaction of genotype and environment. This quantity g, is the 
statistical interaction of d and e, . The relations possible among the 
four phenotypes, listed by Haldane [1946], are definable by the equalities 
and inequalities of d, , e, , and g, . 

Two interaction parameters, g, and g, , in addition to the genetic 
parameters d and A and the environmental parameter e, are needed to 
describe the differences among the six phenotypes given by the genotypes 
AA, Aa, and aa in two environments. More than two environments 
can be accommodated by partitioning the environmental differences 
into orthogonal e components. The genotype-environment interactions 
are describable in terms of corresponding sets of orthogonal g components. 
Where e; , gz , and g, are the summed increments added to the phenotype 
by the environmental and interactive effects in each environment, the 
variances of segregating families include terms depending on V,, and 
V,, , and E, the term reflecting variation between environments being 
inflated by part of the g, interaction. The terms in V,, and V,, take 
coefficients corresponding to those of D and H in V rz and also in the 
total variances of later generations; but V,, and V,, do not partition 
between the variances of different rank in the way shown by D and H. 
The contribution of g, to the environmental variance changes charac- 
teristically with generation. 

The variances of randomly breeding populations include terms in 
ge and g, corresponding to those in d and h. Both g, and g, contribute 


{ 


INTERACTION OF GENOTYPE AND ENVIRONMENT 359 


to the / component. of variation. ‘The effects are discussed of correlations 
of environment on the covariances between relatives. 

Where more than one gene is involved, the variances include terms 
in Gp = >> (V,,) and Gy = >> (V,,) corresponding to those in D and 
H respectively. >>(g,) contributes to the E component of variation. 
Compound interactions of pairs of genes with environment can be 
accommodated by terms in G; , etc., corresponding to genic interaction 
terms in J = > (2), etc. Where genes are linked Gp and Gy depend 
on the covariances of the different g,’s and g,’s in a way corresponding 
to the dependence of D and H on the products of the different d’s 
and h’s. 


REFERENCES 


Anderson, V. L. and Kempthorne, O. [1954]. A model for the study of quantitative 
inheritance. Genetics 39, 883-98. 
Cockerham, C. C. [1954]. An extension of the concept of partitioning hereditary 
variance for analysis of covariances among relatives when epistasis is present. 
Genetics 39, 859-82. 
Fisher, R. A. [1918]. The correlation between relatives on the supposition of Men- 
delian inheritance. Trans. Roy. Soc. Edin. 52, 399-433. 
Haldane, J. B. S. [1946]. The interaction of nature and nurture. Ann. Eugenics 18, 
197-205. 
Hayman, B. I. and Mather, K. [1955]. The description of genic interactions in 
continuous variation. Biometrics 11, 69-82. 
Mather, K. [1943]. Statistical Analysis in Biology. Methuen, London. 
. [1949a]. Biometrical Genetics. Methuen, London. 
. [1949b]. The genetical theory of continuous variation. Proc. 8th Int. 
Cong. Genetics. Hereditas (Suppl. Vol.), 376-401. 
. [1955]. Response to selection. Cold Spring Harbor Symp. Quant. Biol. 20, 
158-65. 
Mather, K. and Vines, A. [1952]. The inheritance of height and flowering time in 
a cross of Nicotiana rustica. Quantitative Inheritance. ed. E. C. Reeve and 
C. H. Waddington. 49-79. H.M.S.O. ’ 


ite 
> 
3 
| 
es) 
: 
— 
q 
5 
| 
| 
| 
7 
| 
— 


THE ANALYSIS OF VARIANCE AND DERIVATION OF 
STANDARD ERRORS FOR INCOMPLETE DATA 


G. N. WILKINSON 


Division of Mathematical Statistics 
Commonwealth Scientific and Industrial Research Organization 
Adelaide, Australia 


1. INTRODUCTION 


In a previous paper [10] based on the fundamental paper by Yates 
[11], the writer has discussed the estimation of missing observations in 
incomplete data. When the data are completed with these estimates, 
standard calculations give the correct estimates of treatment effects, 
etc., and a standard analysis of variance yields the correct residual 
sum of squares for the estimation of error. 

However, as Yates [11] observed, other component sums of squares 
in the standard analysis are incorrect (though perhaps not seriously so). 
For the randomized blocks design, and for a Latin square with one 
missing value, Yates derived formulae for adjusting the treatment sum 
of squares to its correct value. Cornish [2, 3] gave the corresponding 
results for incomplete block designs, and for designs such as lattice 
squares, with one missing value. 

The present paper, following on from the previous paper [10], 
deals with correcting the standard analysis of variance, and derives a 
general formula for the necessary corrections. . Specific formulae are 
given which, in particular, provide the necessary correction of the 
treatment sum of squares when several observations are missing, for 
the designs with two-way restriction. 

The standard formulae for determining variances and covariances of 
treatment comparisons, etc., will also need to be adjusted when the 
comparisons involve missing values. Tocher [8] gave a general formula 
for this purpose, which is here extended to cover the singular cases. 

The principles discussed are illustrated by application to the analysis 
of a Latin square experiment (numerical example), and in the derivation 
of an analysis for B.I.B. designs with missing blocks. 


a> 
j 
7 
q 
: 
*@ 


ANALYSIS OF VARIANCE FOR INCOMPLETE DATA 


2. THE ANALYSIS OF VARIANCE 
Preliminary considerations 


The term “analysis of variance’’ is rather a general one. In this 
paper we shall be dealing specifically with partitions of a total sum of 
squares of observations, formally derivable by the method of unweighted 
least squares. The term “analysis of variance’”’ will be used with this 
sense. 

We assume that each observation is a linear combination of param- 
eters, representing the effects of various experimental factors, plus an 
error term. In practice, some of the effects specified by the model may 
be fixed, and others of a stochastic nature. This and other statistical 
properties of the model, though relevant for determining the properties 
of an analysis of variance, will not be required here in the formal deriva- 
tion of the analysis. 

The principles of construction of an analysis of variance are well 
known. The formulation given here is the most convenient for present 
purposes: 

Suppose f factors 1, 2, --- , f are under consideration. A typical 
factor might be “Blocks” or ‘“Treatments.” ‘Treatments,’ in par- 
ticular, might be represented by several sub-factors, as in a factorial 
design. For completeness we include a “correction” factor, correspond- 
ing to the general mean. 

The linear model which takes into account the effects of all factors 
on the observations is usually referred to as the full model, and gives rise 
to the residual sum of squares in the analysis of variance. 

Apart from this residual term, a typical component sum of squares 
in an analysis of variance could be described in full (with an appropriate 
renumbering of factors) as 


the variation ascribable to the effects of factors 1 to i, when the variation 
ascribable to factors (i + 1) to j is eliminated, and the effects of factors 
(j + 1) to f are ignored. 


The sum of squares corresponding to this description may be expressed 
as the difference between the residual sums of squares obtained when 
two models are fitted to the data, the one which takes into account the 
factors 1 to j, and the other which takes into account only the factors 
(¢ + 1) to j, other factors being ignored. 

In particular, if only the correction factor is eliminated, the latter 
model simply specifies that all observations in the design have the 
same mean. This model may be termed the uniform model. The 


361 
2 
Ay 
3 
A 
q 
y 
L 


362 BIOMETRICS, SEPTEMBER 1958 


corresponding residual sum of squares is simply the corrected total 
sum of squares of the observations. 

In certain analyses, component sums of squares appear in which no 
variation ascribable to other factors is eliminated. To include these in 
the above formulation, we define a null model, which specifies that all 
observations have zero mean. The corresponding residual sum of 
squares is just the crude total sum of squares of the observations. 

With this last-mentioned device, every component sum of squares 
in an analysis of variance, apart from the residual term, may be expressed 
as the difference between two residual sums of squares. 

To avoid confusion, models other than the full model will be referred 
to as auziliary models, and the corresponding residual sums of squares 
will likewise be termed auziliary. The residual term in the analysis of 
variance, which corresponds to the full model, will be referred to where 
necessary as the main residual sum of squares. 


Correction of a standard analysis 


When a set of observations is incomplete in relation to a given 
experimental design, and missing values are estimated according to the 
full model, a standard analysis of variance on the completed set of 
observations will yield the correct residual sum of squares for the 
incomplete data. Usually this will be the best method of determining 
the residual sum of squares. 

However, the estimates of missing values derived according to the 
full model are generally incorrect with respect to the auxiliary models. 
Their use in a standard analysis will therefore lead to exaggerated values 
for auxiliary residual sums of squares, and thus give incorrect values 
for the component sums of squares other than the main residual term. 

Correct auxiliary residual sums of squares could be determined by 
computing auxiliary analyses of variance (with auxiliary estimates of 
missing values, if necessary), and the main analysis modified accordingly. 

However, it will usually be simpler to compute a set of corrections, 
each of which adjusts the apparent value of an auxiliary residual sum of 
squares to the correct value. Each component in the standard analysis, 
since expressible as a difference between two residual sums of squares, 
can then be adjusted to its correct value by applying the appropriate 
pair of corrections (or only one if the main residual sum of squares is 
involved). 

The general formula for these corrections is given in section 4, with 
specific formulae for the important cases in section 5. The general 
formula also covers the singular cases, such as when whole blocks or 
treatments are missing, in which the estimates of missing values are to 


. 


ANALYSIS OF VARIANCE FOR INCOMPLETE DATA 363 


some extent arbitrarily determined. It may be noted that in these 
cases the corrections will remove any variation in the standard analysis 
that is attributable to arbitrarily determined effects, giving a well- 
determined analysis for the incomplete data. 

Special considerations apply to the analysis of orthogonal designs. 
When one factor is orthogonal to other factors, the variation ascribable 
to this factor is the same whether the variation ascribable to the other 
factors is eliminated or ignored. Consequently one analysis in the 
orthogonal situation represents two or more analyses that would be 
logically and numerically distinct in the non-orthogonal situation. 

As the incomplete data will usually be non-orthogonal, it will be 
necessary, in principle, to replace the standard analysis by the two or 
more corrected analyses appropriate to the incomplete design. 

In practice, however, only a few component sums of squares, taken 
from the appropriate analyses, will be required for statistical purposes. 
Since they can be computed directly from the standard analysis as 
described above, these components may be presented in a single table 
that might be called an analysis of variance (in a wider sense than used 
here), but which would not have the usual additive property. Reference 
to the example in section 7 will make this clear. 


Adjustment of the degrees of freedom 


In the non-singular case, when the missing values are well deter- 
mined, it is necessary only to reduce the residual (and total) degrees of 
freedom by the number of missing values. In the singular case, the 


necessary adjustments, if not otherwise obvious, may be determined 
by the following rule: 


Reduce the degrees of freedom for each residual sum of squares (main 
or auxiliary) by the rank of the corresponding equations for missing 
values. 


Correction of an analysis of covariance 


It has been pointed out in [9], [10] that if the missing values of a 
dependent variate are estimated ignoring the concomitant variates, and 
if the corresponding values of concomitant variates are replaced by 
estimates similarly determined, then the analysis of covariance of the 
data completed in this way will yield the correct residual sums of squares 
and products, from which the residual sum of squares (eliminating 
covariant effects) may be determined in the usual manner. 

To determine correct auxiliary residual sums of squares and products 
from the standard analysis, corrections of a similar form to those 


ip: 
te 
q 
ate 
4 
is 
> 
e 
3. 
of | J 
8, 
Pot 
of 
1S, 
eS, 
tte 
1s 
ith 
or 
to 


364 BIOMETRICS, SEPTEMBER 1958 


mentioned above may be applied, the necessary correction formulae 
being given in section 4. The procedure has been described and illus- 
trated in [9]. 


3. NOTATION, PRELIMINARY FORMULAE 


Let z denote a vector of observations of a variate y, formally com- 
plete in relation to a given experimental design. With respect to a given 
linear model, the vector z may be expressed as the sum of two com- 
ponents, 


z= Ez+ Rz, 


Ez being the vector of estimated expected values for z, and Rz the 
vector of residuals. The vector z may be represented as a point in a 
vector space, and the linear model specifies a certain subspace of this 
space, in which the true expectation point must lie. The principle of 
least squares selects as the estimating point, the point in the subspace 
nearest to z, that is, such that z’R’Rz is a minimum. Ez is therefore 
the orthogonal projection of z on this subspace, which implies that the 
matrices E, R are symmetric and idempotent (E’ = E, R’ = R), and 
the residual sum of squares is therefore z’Rz. 

Suppose now some observations are missing. Let y denote the vector 
of existing observations, and u a set of missing values estimated according 


to the given model. Let z = Hi and let R be partitioned accordingly, 


R, LRuy Rus 
(The matrix E will not be required further.) 
Equations for the missing values u were derived and discussed in the 
previous paper [10]. The results given there are summarized below in 


the present matrix notation: 
The equations for the missing values are 


Ruz = Ruw + Ruy = 0, (1) 


and the residual sum of squares for the incomplete data is correctly 
determined as z’Rz (the usual degrees of freedom being reduced by the 
rank of Ru). 


If Rus is non-singular, the equations (1) have the unique solution 
u= —RiuRuyy. (2) 


If Rus is singular, an effective inverse of Ru , for which we shall use the 


| 
i 
i 
| 


he 


in 


ANALYSIS OF VARIANCE FOR INCOMPLETE DATA 365 
same symbol Riu , provides a solution in the form (2). The matrix 
ws must satisfy the relation 


lf F denotes the symmetric idempotent matrix which gives the ortho- 
gonal projection on the range of Rus , relation (3) implies 


RwRwF = FRuRw = F. (4) 


As was mentioned in [10], there is a unique solution uy = Fu in the 


range of Ruu , and any other solution of the equations (1) can be expressed 
in the form 


(5) 


where v is any arbitrary vector orthogonal to uy , that is, such that 
Fv = 0. Thus Fu represents the estimable component of u, and the 
linear functions orthogonal to Fu, represented by Gu, where G = 
I — F, may be arbitrarily determined. 

In the next section, two models for the data will be under considera- 
tion, the full model referred to in section 2, and an auxiliary model. 


The notation introduced above, namely R, z = r|, F, etc., will be 


used for the full model, and a parallel notation, R, z = Hi F, etc., 


for the auxiliary model. In particular, the equations for auxiliary 
estimates of missing values are 


Roz = Rufi + Ruy = 0, (6) 
which are more conveniently expressed, for present purposes, in the form 
Ru(u — oi) = Ruz. (7) 
A solution of these equations is 
(u — = Ruk.z. (8) 
Concomitant variates 


Let X = [z, , 2, --+ , ,] represent a set of p concomitant variates. 

Let W = [w, , W. , -+: , W,] denote the matrix of values of X, 
corresponding to z. 

When 


4, 
| 
= Ru . (3) 
‘ 
4 
d 
| 
(1) 
tly 
the 
n 
the u Vv; Vv 


366 BIOMETRICS, SEPTEMBER 1958 


where X is the matrix of values of X corresponding to the existing 
observations y, and V is a matrix of estimated values, given by the 
equations parallel to (1), 


RuW = + RuyX 0, (9) 


or for the variate z; in particular, 
Ruv; ~ Ruyx; = 0. (10) 


(Note that the one inverse Riu suffices to solve all missing value equa- 
tions.) 

The standard analysis of covariance on the completed data z and the 
correspondingly completed concomitant data W provides the correct 
residual sums of squares and products for the incomplete data y and 
the corresponding concomitant data X. They are as follows: 


z’Rz, W'Rz, W’RW. (11) 


(pX1) (pXp) 


Hence the vector of regression coefficients for (y, X) is correctly deter- 
mined as 


b = (W’RW) '(W’Rz2), (12) 

and the residual sum of squares for y, eliminating covariant effects, as 
z’Rz — b’(W’Rz). (13) 

In the next section, a parallel notation will be used for the auxiliary 


model, namely, W = - , b, ete. 


4. CORRECTION OF AN AUXILIARY RESIDUAL SUM OF SQUARES 


The apparent value of an auxiliary residual sum of squares, as 
determined from the completed data z, is z’Rz. The correct value, 
however, is given by 2’RZ, in which appropriate auxiliary estimates fi 
have been substituted for the main estimates u of missing values. 
The apparent value may therefore be adjusted to the correct value by 
subtracting the correction 


C = 7/Rz — 
(Rz)"(z — 2) + (z — (14) 
= (R.z)’(u — 4), 


since Ruz = 0. 


| 
| 
( 


) 


e, 


Ss. 


ANALYSIS OF VARIANCE FOR INCOMPLETE DATA 367 


Using (7) and (8), C can be expressed in the alternative forms 
C= (u ii), 
= 


(15) 


The vector Ruz in these correction formulze is the vector of apparent 
auxiliary residuals of the estimates u, that is to say, the residuals of u 
under the auxiliary model, as determined from the completed data (y, u). 

The vector of correct auxiliary residuals of u is F(u — i). In the 
formulae above, the matrix F is implicit in Ru , Ru . 

A simpler notation than that used above will be introduced here, 
for purposes of reference: 


Let n denote the vector of apparent auzxiliary residuals of the main 
estimates u, let A denote the matrix of coefficients in the equations for 
auxiliary estimates ti, and A~', an effective inverse of A. Then the 
subtractive correction for adjusting the apparent value of an auziliary 
residual sum of squares to the correct value is 


C = — fi) = (u — — = wA'n. (16) 


In the next section, specific formulae are given for the important 
cases. In other cases the matrix A may be determined by methods 
discussed in [10], or else auxiliary estimates fi determined by iterative 
methods. 

As would be expected, the correction C vanishes if the auxiliary 
residuals n = 0, or equivalently, if Fu = Fa, that is, if u and & are 


identical in their essential components with respect to the auxiliary 
model. 


Corrections in an analysis of covariance 


By essentially the same derivation as used above, it can be shown 
that the apparent values of auxiliary residual sums of squares and 
products may be adjusted to the correct values by subtracting the 
corrections 


The auxiliary vector of regression coefficients b, and hence the auxiliary 
residual sum of squares eliminating covariant effects, may then be 
computed by formulae similar to (12) and (13). 

Let , & , , denote the vectors of apparent auxiliary residuals 
of the estimated values v, , V2 , --- , V, in the concomitant data. (The 
—, are determined by the same formula as for n). In this simpler nota- 


3 
) 
s 
a 
AS 
ii 
_| 
4) 
a 


368 BIOMETRICS, SEPTEMBER 1958 


tion, the corrections (17) become 
eas, . (18) 
(p corrections) (p” corrections) 


Note that the one inverse suffices for computation of all these corrections. 


5. SOME SPECIFIC CORRECTION FORMULAE 
Null model, &(z) = 0: correction of the crude total sum of squares. 


C= (19) 
Uniform model, &(z) = u: correction of the corrected total sum of squares. 


Suppose there are N observations, r of which are estimated values w. 
Let G’, G, respectively, denote the incomplete and completed totals of 
the observations. Then by direct argument, 


G” 
N-r 


Alternatively, using formula (16), the apparent auxiliary residuals of 
the estimates u are 7 = u — G/N, the matrix A = I — 11’/N (where I 
is the unit matrix, 1 a column of units, and hence 11’ a matrix with all 
elements unity), and thus A“* = I + 11’/(N — r). Hence formula 
(16) gives 


c= (20) 


2 


t=1 


where R = >> n,. 
However, the correction is easier to compute in the form (20), 
except when there is only one missing value, in which case (21) reduces to 


N 2 (Nu — 
7-1" 


Single classification, &(2;;) = u + a; . 


The auxiliary residual sum of squares in this case is the Within Class 
sum of squares, and this is just the sum of the corrected total sums of 
squares for the classes. Therefore each class with missing values 
contributes a correction of the form (20): 

For a particular class with n observations, r of which are estimated 
values, let T’, T denote the incomplete, and completed totals, respec- 
tively. The correction for that class is then 


C= (22) 


(r <n). (23) 


r R' 
Fes a1) | 
2, 1 (21) 
| 
{ 


ANALYSIS OF VARIANCE FOR INCOMPLETE DATA 369 


In the singular case, when the whole class has been estimated, the 
correction is clearly 


n 
(24) 
n 
It should be noted, in particular, that the correction of the treatment 
sum of squares in the analysis for Randomized Block and incomplete 
block designs is given by (23). The auxiliary classification is the block 
classification, so that in (23), 7’, T are the incomplete and completed 
block totals. Formulae for these cases were given earlier by Yates 
{11] and Cornish [2, 3]. 


Two-way classification (p X q), one observation per subclass, &(z;;) = 
puta; + 8B;. 


The most familiar example of this auxiliary arrangement is the 
row X column classification of a Latin square or Youden square, and 
the correction formulae derived below give the required correction of 
the treatment sum of squares for these designs. 

The auxiliary residual sum of squares is the interaction sum of 
squares for the two-way classification, and the apparent auxiliary 
residuals of the estimates u are given by 


pan = pqu — pk, — pC, + G, (25) 
where R, , C, are the relevant row and column totals, and G the grand 
total, for the completed data. 

In the equations for auxiliary estimates ii, the matrix A has elements 
a,, , each of which corresponds to a pair of missing values (u, v) and is 
determined by their relative position in the classification, according to 
the following table: 


Que X pq Same row Different rows 
Same column (p — 1) —(q- (26) 
Different columns| —(p — 1) 1 


If the matrix A is expressed in the form 
A =B + 11’/pq, 
the table of coefficients for B, corresponding to (26) is 


bus X pq | Same row Different rows 
Same column —4q 
Different columns —p 0 


= 
ote 
| 
d 
| 
i, 
4 


370 BIOMETRICS, SEPTEMBER 1958 


Generally the full set of missing values in the two-way classification 
can be divided into groups which are disjoint with respect to rows and 
columns, that is to say, such that the missing values in any group have 
no rows or columns in common with missing values in other groups. 
Consequently the matrix B, by virtue of the zero coefficient in (27), 
can be expressed as the direct sum of matrices B,; , each of which cor- 
responds to a group of missing values as defined. This last-mentioned 
property considerably simplifies the inversion of A. For if A and B 
are both non-singular, 


—1 gy, 


(28) 


and 
B" = B'+ B'+--- (direct sum), 


where in (28), 8 is the column of row sums of B™', and A is the sum of all 
elements in 

Let n; denote the vector of residuals for missing values in the 7‘ 
group, 6, denote the column of row sums of B;', A; the total of the 
column 8; . Let w; = 5{n;/pq, d; = A;/pq, W = >. w,,D= =. d; . 

Then substitution of (28) in C = n’A“‘n gives 


The terms n/B;'n,; may be called intra-group corrections, and the last 
term in (29), the inter-group correction. 

Fach intra-group correction, and the corresponding quantities d and 
w for the inter-group correction, will depend on the configuration in 
the p X q classification of the relevant group of missing values. Form- 
ulae for the simplest configurations are set out in Table 1. In the 
event that all missing values are in the one configuration or group, 
Table 1 also provides the complete corrections n’/A~'n. 

These formulae should be sufficient for most practical situations, 
since, in particular, any group with four or less missing values must 
conform to one of the configurations tabulated. 

It is theoretically possible, but unlikely to occur in practice, that 
though A is non-singular, B is singular. Let B, denote the singular 
component in the direct sum of B, and let A, denote the corresponding 
principal minor in A. A, will be non-singular. The correction C is 
given by substituting Ay’ for By" in (29), w, and d, being determined 
accordingly, and adding the additional correction 


(30) 


Pq Vg 
D*i-d)+Q+ Dadi’ 
where Z = (1 + D)w, — dW. 


1 
‘ 
‘sf 
| 
{ 
4 
| 
| 
| 


< 
= 
Zz 
2 
2 


4 


ANALYSIS OF VARIANC 


+ + + = W 
— "qd + gd “gd = 7 
A/(t4's — = 
(wd — =o ‘(s'f + '2'9),d+7,d = w 
wep =D — tod) = 
+ '4) — (2 + '0)d = 


tg) — 
Mos-qng 
eee 


sanjea 
+ '#) ‘smou Burddejiaro omy 


pod bd 


‘uod — + = 7 


od — =) 


“Aw — + 4) 


) 


bd+ov=9q od = D 


d|%. 


auo ut auv sonyer 


s—-b=o ‘| -b=,b 
sa-d=ad ‘g-d=,d ‘t-d= 


AVINWHOY 


1 


(4 syenpises ayy jo 
sanjoa Burssiue fo 


4 
371 
! 
| | + 
| | | 
| 
| | 
i | 
| | | | 
| 
| H | 
| - => 
| 
| 
| 
| 
' | | | | 
| | | 
Dy | | | 
| 
| | 
| ™ | | 
. | H 1 
| | | | 
| 
is 
0) 


372 BIOMETRICS, SEPTEMBER. 1958 


Several two-way classifications 


As the auxiliary residual sum of squares in this case is the sum of the 
interaction sums of squares for the several classifications, a correction 
of the form (29) applies for each classification with missing values. 
Note that in (25), @ will be the completed total for the particular 
classification. 

; In particular, the formula (29) provides the necessary corrections for 
‘ the treatment sum of squares in lattice square designs. The formula for 
; a single missing value was given earlier by Cornish [3]. 


Two-way classifications: the singular cases 


| These usually arise through the loss of whole rows or columns, and it is 

unlikely in practice that the data will be analysed with estimated 
{ missing values. However, missing value principles may be applied to 
determine the appropriate analysis for the incomplete design. 

It is sufficient to note here, that if r rows and s columns are missing 
from a p X q classification, the correct auxiliary residual sum of squares 
'. is the interaction sum of squares for the (p — r) X (q — s) classification. 
g If there are some additional missing values scattered in this classification, 
the correction for the additional values is given by (29), the matrix A 
and the residuals » being determined relative to the (p — r) X (q — 8) 
classification. 

The analysis of Latin squares with missing rows, columns or treat- 
ments has been discussed by Yates [11], Yates and Hale [12], DeLury [6], 
i and the analysis of lattice squares with a missing row, column or treat- 
; ment, by Cornish [5]. 


6. ADJUSTMENT OF VARIANCES AND COVARIANCES 


We shall assume in this section that the covariance matrix of the 
observations is Io’, or in other words, that the observations are uncor- 
related and homogeneous in variance. 

For notation and preliminary formulae, refer to section 3. The 
arguments will be given for the singular case, the non-singular case 
then being obvious. ( 


Covariance matrix for the completed set of observations 


2 
Let V(z) = ad Vu, y. | denote the covariance matrix for 
Viu,y) Vu). 


( 

z= } 

u 


or 


ANALYSIS OF VARIANCE FOR INCOMPLETE DATA 373 


It should be noted firstly that only the component Fu of u is subject 
to statistical variation, the orthogonal component Gu being arbitrary. 
Assigning zero variance to Gu, we have V(u) = V(Fu), V(u, y) = 
V(Fu, y). 

Now Fu = —FRuiRoyy, so that 


Viu, y) = —FRuRuyo’, 


= 
= FRiu(Ru — Riv)(FRu)’o, since =R, 
= — 
= FRuFo’? — Fo’, by virtue of (4). 
Therefore 
V(2) = ?. (31) 
—FRuRy FRuF — F 


In the non-singular case 


We of. (32) 


With reference to the point discussed by Fairfield-Smith [7], it may 
be noted that here we have determined the covariance matrix of the 
estimates u of missing observations, namely, V(u) = (FRuF — F)o’. 

If, on the other hand, we let u represent the missing observations in 
the sense of random variates, and let & (instead of u as above) represent 
the predicted values derived from y, then V(u — fi) = (FRUF)o’, 
this covariance matrix referring, as above, to the predictable component 
Fu of u, and the corresponding prediction Fi. Expressed in another 
way, if the variates were normally distributed, the fiducial distribution 
for Fu, determined from y, would have mean F4@ and covariance matrix 
(FRiF)s’, where s’ is the residual mean square for the data. 


Covariance matrix for linear comparisons 


We define the term linear comparison as any linear function of the 
least. squares estimators of the parameters in the linear model. 

Let Lz, = Lyy + Luu, denote a set of linear comparisons in the 
completed data z. From the definition above it follows that the matrix 
L satisfies 


LR = 0, (33) 


Brae 
fe 
| 
ar 


374 BIOMETRICS, SEPTEMBER 1958 


which implies, in particular, that 
LyRyu = — LuRus . (34) 


In the singular case, when u is to some extent arbitrary, we shall be 
interested only in well-determined comparisons, that is, comparisons 
such that 


L. = LF. (35) 
For well-determined comparisons Lz, 
V(Lz) = [LyLy + — F)Li — 
— 
= (LL; + L.(FRuF F 
+ by (34), 
= [LyL; + Lo(FRuF — F + 2F)Lijo’, by virtue of (4). 
Since Lu = LiF, and LL’ = LyL} + Lili , 


V(Lz) = + LaRwLio’. (36) 


In this formula, LL’o’ is the standard covariance matrix for Lz, and 
L.RwLio’ is the adjustment for the effect of missing values. 


Covariant adjustments 


For linear comparisons Lz, unadjusted with respect to concomitant 
variates, the standard covariance matrix is adjusted for the effect of 
missing values as described above. 

Referring to section 3, it is clear from (12), since LR = 0, that the 
vector b of regression coefficients has zero covariance with Lz, and that 
the covariance matrix for b is given by the standard formula 


(W’RW) (37) 


Therefore, if the concomitant variate values are adjusted to W, , so 
that the covariant adjustment to Lz is 


—L(W — W,)b’, 


the corresponding adjustment to the covariance matrix of Lz is given 
by the standard formula 


L(W — W.)(W’RW) '(W — W,)'L’o’. (38) 


Note that this adjustment is in addition to, and independent of, 
the adjustment for the effect of missing values. 


— 
) 
1 
| 
4 
ue 


A 
ay 
= 
az 
~ 
<2) 
=< 
> 
= 
M 
~ 
=< 
=< 


S[8}0} 


€8 


8]8}0} 


‘duwoouy 
8[8}0} MOY 


F 


VLVQ NILVT 


ATAVL 


375 
+ ~ So | 2 
» & ® 3S 
= 
pe 
=x. & ©  *& 
| 
a maa | 
~ 
| | 
ooo wh e 3 
16 + N + 
fet & 
t 
0 
7 
on 
Q 
8) 
f, 
’ 


376 BIOMETRICS, SEPTEMBER 1958 


7. NUMERICAL EXAMPLE 


Four observations have been omitted from the data for a 6 X 6 Latin 
square, given in Table 2. These data were taken from Table 4.7 in 
Cochran and Cox’s Experimental Designs [1], but for present purposes 
the original context has been ignored, the familiar terms Rows, Columns, 
and Treatments being substituted. 

The missing observations in Table 2 have been replaced by estimates, 
shown in bold type, which will be derived below. The incomplete and 
completed totals are also shown. 


Estimation of missing values 
The equations for estimating the missing values are 
u; = (6R,, + 6C., + 67,, — 2G)/36, 
(3R,, + 3C., + G)/18, (i 1, 2, 3, 4), 

in which the relevant totals R,, , C., , etc. are completed totals con- 
taining the unknowns u, , wu, , etc. For instance, Ry, = 25.4 + uw , 
Cy, = Cu, = Cy, = 16.6 + up + us + uy. Multiplying through by the 
factor 18, and transferring terms in the unknowns to the left we obtain 


10 -2 14] 

11-2 10 —2 52.4 


The matrix of these equations is easily inverted in the partitioned form 
indicated, giving 


48 -4 —4139.5] [3.43 | 
8: 52 12 12) 14] | 3.11 
~ 456 
Us 12 51 13] 52.4 6.51 
1-4; 12 13 [4.934 


Corrections for the analysis of variance 
Correction factors: 787.551 (completed data), 
706.880 (incomplete data). 
Corrections of auxiliary residual sums of squares: 


Ce (correction of the corrected total sum of squares): >>u” = 88.122, 
Ce = 88.122 — 787.551 + 706.880 = 7.451. 


| a 
4 


22, 


ANALYSIS OF VARIANCE FOR INCOMPLETE DATA 377 


C,: In the treatment classification, B and D have one missing 
replicate, and F has two. Applying formulae (22) and (25), the correction 
for the residual sum of squares eliminating treatments only is 


(6 X 4.93 — 29.83)? | (6 X 6.51 — 43.01)? 
30 30 


2 


= 15.049. 


Cre: With respect to the auxiliary, row X column classification, 
the auxiliary residuals of the estimated missing values are given by the 
formula 


= (36u — 6R, — 6C, + G)/36. 


However, since treatments are orthogonal to rows and columns in the 
design, the residuals are more simply computed as the estimates of the 
treatment effects for the missing values, 


= (6T, — G)/36, 


as may be seen from the missing value equations. It will be found more 
convenient to use the undivided quantities 36y in the computations 
below. 

The formulae given in Table 1 may be used to compute the required 
correction for the auxiliary residual sum of squares. The missing values 
fall into two groups which have no rows or columns in common, namely, 
(u,) and (uw. , Us , Uy). These correspond to configurations 1 and 2 of 
Table 1. For (u,), a = 24, 369 = —129.74, d = 1/24, and 36w = 
—129.74/24. For (uz , us , us) the preliminary quantities are p’ = 5, 
s = 3,0 = 3,a = 12, b = 18. The quantities 36n are — 129.74, 89.68 
and 10.60 respectively, the sum of which is 36P = —29.46, d = 3/12, 
36w = —29.46/12. Referring to formula (29), D = 7/24, 361 = 
— 188.66/24. Hence 


Ceo = 129-74 (129-74 + 89.68" + 10.60" 29.46") _ 188.66" 
1080 720 26,784 


41.289. 


Crr, Cer: Considering in turn the auxiliary row X treatment, 
column X treatment classifications we may determine the corrections 
Cer, Cer ina similar way to that described above. The corrections are 


rs: 
- 
| 


BIOMETRICS, SEPTEMBER 1958 


378 


02 | 199° [830], 1g 
99°9¢ | 9g9°9¢ 91 
> (99 — .L | | 196° S9T iB 
66° €8 (49-) Lou | — oN | Oa Or 
9¢°9¢ (449-) | (40 — 499) 209° 
(49 — 440) | (499-) you LES Y 
sIsAyeue prspueyg 


(‘9'S) poysnipy 


GONVIUVA JO SUSATVNY 
€ ATAVL 


F 
‘ 
i 
# 
ig 
3 
2 
] 


ANALYSIS OF VARIANCE FOR INCOMPLETE DATA 379 


Car = 17.950, Cer = 5.115. 


Normally these corrections would not be required (see below). 


Analyses of variance 


The standard analysis of the completed data, and the correct analyses 
derived by adjustment from the standard analysis, are shown in Table 3. 
The necessary adjustments are indicated in parentheses. For brevity 
the factors are represented by the symbols R, C, T, and component 
sums of squares by symbols such as 7.RC, which denotes the treatment 
sum of squares eliminating row and column effects. 

In the original context, significance tests were required for the 
components T.RC, R.CT, and C.RT. The relevant information is 
tabulated in the summary, Table 4. The F ratios that would be given 


TABLE 4 
SUMMARY 
Component |_ d.f. SS. M.S. F % points for Fis 
R.CT 5 19.42 3.884 1.10°-*- (1.39"-*-) “5%: 2.85 
C.RT 5 56.56 11.211 (@21*) **1%: 4.44 
T.RC 5 122.66 24.532 | 6.93** (9.26***) | ***0.1%:7.27 
Error 16 56.66 3.541 


by the unadjusted analysis are also shown, in parentheses. It can be 
seen that these are appreciably in error, the errors being 26, 33, and 
34% respectively. The corresponding errors in the probabilities are 
much greater, as the percentage points for F indicate. 

The adjusted analyses in Table 3 have been presented mainly for 
purposes of illustration. In practice only the adjusted components in 
the summary need be computed, requiring only the corrections Cgc , 
Cer , and Cer . More commonly, the row and column components 
are of no interest, beyond indicating the effectiveness of the design in 
the particular circumstances. Then only the treatment sum of squares 
need be adjusted, requiring only the correction Circ . 


Variances of treatment comparisons 


We shall first determine the intra-row-column variance component, 
of the covariance matrix for the treatment means. This is given by 
formula (36). 


Treatments A, C, EF involve no missing values. The corresponding 


| & 
> 
be, 
ag 
= 
| 


380 


BIOMETRICS, SEPTEMBER 1958 


means therefore have variance o/6 and are uncorrelated with the other 


treatment. means. 


For treatments /’, D, 2 (this ordering corresponding to the order of 
the missing values), we have, in the notation of formula (36), 


Zp 1100)™ 
Lu =</0 0 1 0 
0001)” 
LU, - 
l'rom the missing value equations, 
r 48 8 -—4 
8 S2 12 12 
Therefore 
1146 8 8 
1 
LeR = 910 8 51 13], 
8 13 51 
so that 
Zp 1/6 0 0 0.1272 0.0088 0.0088 
0 1/6 O |e? +] 0.0088 0.0559 0.0143 |c’, 


LO 1/6 
0.294 0.009 0.009 
=| 0.009 0.223 0.014 


0.0088 0.0143 0.0559 


| 0.009 0.014 0.223) 


Hence the variances and standard errors of the treatment. comparisons 


are as follows: 


Comparisons Variance Standard error 
(A; C; E) 0.3330 0.577¢ 
(A, C, E; B, D) 0.3890" 0.6240 
(A, C, E; F) 0.4610" 0.6790 
(B; D) 0.4170? 0.6460 
(B, D; F) 0.5000” 0.7070 


aby 
4 
4 
i 
2 
o. 
| 


ANALYSIS OF VARIANCE FOR INCOMPLETE DATA 381 


8. B.I.B. DESIGNS WITH MISSING BLOCKS 


" "To illustrate the theoretical application of missing value principles. 
we shall derive the intra-block analysis for B./.B. designs with one or more 
missing blocks, but with no more than one replicate of any treatment missing. 
The case of a single missing block was examined earlier by Cornish [4], 
who also considered a single missing treatment. 


Notation 


In the complete design: b blocks, k treatments per block; ¢t treatments 
replicated r times; intra-block pair replication of treatments, \. N = 
bk = rt, X = r(k — 1)/(t — 1). 

Effective treatment replication, r’ = At/k. (EF = r’/r.) 
Estimated treatment effects, 


where 
1 
Q, = T, - k B, 
Tq 


the summation of block totals referring to the blocks with the treatment 

In the incomplete design, with s missing blocks (numbered 
j = 1, 2, --- , s), the treatments may be divided into (s + 1) groups. 
The first s groups, each of k treatments, contain the treatments with one 
missing replicate, and correspond to the s blocks in which the missing 
replicates occur. The treatment effects in the j** block will be designated 


, t;, , and the mean treatment effect over the j* block denoted by i; . 


The (¢ — ks) treatments in the remaining group have no missing repli- 
cate, and their effects will be designated ¢., . The treatments will be 
numbered serially within each group, g = 1, 2, --- . 

Incomplete quantities, when they occur, will be distinguished by the 
superscript (°). 
Missing values 

The equations for the missing values in the j block are 

Uje = b; + tie; 
=@,+t,—i,, q=1,2,-+-,k. 


Now 


tig = tig + (uj, — tj), i= 


x 
ae ey. 
| 
1s 
| 
(40) 


382 BIOMETRICs, SEPTEMBER 1958 


Substituting, the equations (39) become 


which give the solutions 


in which the mean a; may be arbitrarily determined. 
Estimates of treatment effects 
t., : The estimates ¢,, are clearly unaffected by the missing blocks, and 
are given by the standard formula 
1 


t;, : Substitution of the solutions (42) in (40) gives 


lie 


Il 


1 (r' ti, 


r —1 (a.- 2 


q=l 


(44) 


Variances of treatment comparisons 


We shall first determine the adjustments, for the effect of missing 
values, in the covariance matrix for the ¢, , using formula (36). 
Let t; ,j7 = 1, 2, --- , s, and t, denote the appropriate vectors of ¢, . 
It is clear that V(t.) and the V(t, , t;) need no adjustment, since the 
t., do not involve the missing values. Now from (40), 


t= +5Fu,, 


where ” the idempotent matrix I — 11’/k, and from (41) it is apparent 
that = 7 i is an effective inverse for the missing valtie equations. 


From ae (36) it follows that the V(t, , t;) need no adjustment, and 
that, for the covariance matrices V(t;), the necessary adjustments are 


1 2 (1- 1 2 
=D D Fo’, i.e. 7 k 11’ Jo’. (45) 


It is now a simple matter to adjust the variances of affected treatment 


¢ 


ANALYSIS OF VARIANCE FOR INCOMPLETE DATA 


comparisons. 


which are as follows: 


(i) both treatments with no missing repli ate: 5 o 


(ii) both treatments in the same missing block: 


(iii) treatments 


(iv) only one treatment in a missing block: 5 o + 2 = 


2 


383 


There are four types of comparison, the variances of 


(standard 
formula) 


in different missing blocks: ee 


Analysis of variance 


The treatment sum of squares in the standard analysis of the com- 


pleted data is 


t 


a=1 


The required correction is given by formula (24), and is 


i=l 


k 


q=1 i=1 


by virtue of (39). The correct treatment sum of squares is therefore 


r’ > (ti. i;)’. 


(46) 


This may be simplified to the form shown in Table 5, which gives the 
analysis of variance for the incomplete design. 


TABLE 5 
ANALYsIS OF VARIANCE (INTRA-BLOCK) 
Component df. SS. 
Blocks « 1 
(ignoring treatments) b-—s 1 k p> i N — sk G 
(eliminating blocks) t-1 + Qe 
1 8 ( k i ) 
(N —b-—t+1) 
Residual — s(k — 1) By difference 
Total 2 ] 02 


| 
| 
wil 
1 kr 
, 
kr’ — 1» 
1 kr 
on 
a 
‘Cae 
) 
> 


384 BIOMETRICS, SEPTEMBER 1958 


It will be found more convenient in practice to compute with the 
quantities kQ and kr’ = Xt, thereby avoiding unnecessary divisions. 
The current practice, of course, is to incorporate the factor k in the 
definition of Q. 

The Blocks (eliminating treatments) component, if required, is 
computed by subtracting the Treatments (ignoring blocks) component 
from the combined Block and Treatment components of Table 5. 

If the missing blocks constitute a complete replication of treatments, 
the incomplete design is a standard P.B.I.B. design. It is interesting 
to note, however, that there is no essential simplification in the analysis 
for this case. This suggests that in the consideration of experimental 
designs, if the requirement of full balance is relaxed, the requirement of 
strict uniformity of treatment replication might also be relaxed, and 
lead to a wider range of designs with simple analysis. 


9. ACKNOWLEDGMENTS 


The writer is indebted to Dr. A. T. James for some valuable dis- 
cussions during the preparation of this paper, particularly in connection 
with singular matrices. The writer would also like to thank Dr. E. A. 
Cornish for his encouragement of this work. 


10. REFERENCES 


[1] Cochran, W. G., and Cox, G. M. Experimental Designs. John Wiley and 
Sons, New York, 1950. 

[2] Cornish, E. A. The estimation of missing values in incomplete randomized 
block experiments. Ann. Eug. 10, 112, 1940. 

[3] Cornish, E. A. The estimation of missing values in quasi-factorial designs. 
Ann. Eug. 10, 137, 1940. 

[4] Cornish, E. A. The analysis of quasi-factorial designs with incomplete data. 
J. Aust. Inst. Ag. Sc. 6, 31, 1940. 

[5] Cornish, E. A. The analysis of quasi-factorial designs with incomplete data: 
lattice squares. J. Aust. Inst. Ag. Sc. 7, 19, 1941. 

[6] DeLury, D. B. The analysis of Latin squares when some observations are 
missing. J. Am. Stat. Assoc. 41, 370, 1946. 

[7] Fairfield-Smith, H. Note: missing plot estimates. Biometrics 13, 115, 1957. 

{[8] Tocher, K. D. The design and analysis of block experiments. J. Roy. Stat. 
Soc. 14 (B), 45, 1951. 

[9] Wilkinson, G. N. The analysis of covariance with incomplete data. Biometrics 
18, 363-372, 1957. 

{10] Wilkinson, G. N. Estimation of missing values for the analysis of incomplete 
data. Biometrics 14, 257-286, 1958. 

[11] Yates, F. The analysis of replicated experiments when the field results are 
incomplete. Emp. J. Exp. Ag. 1, 129, 1933. 

{12} Yates, F. Incomplete Latin squares. J. Ag. Sc. 26, 301, 1936. 

[13] Yates, F., and Hale, R. W. The analysis of Latin squares when two or more 
rows, columns, or treatments are missing. Supp. J. Roy. Stat. Soc. 6, 67, 1939. 


| 
il 
| b 
a 
t] 
d 
it 
ir 
re 
lu 
m 
st 
1 


THE MATHEMATICAL FOUNDATIONS UNDERLYING 
THE USE OF LINE TRANSECTS IN ANIMAL ECOLOGY 


J. G. SKELLAM 
The Nature Conservancy, London, England 


INTRODUCTION 


Transect methods have been known and employed for many years 
in plant ecology and are particularly simple because of the stationary 
character of the objects studied. The estimation of populations of large 
animals such as birds by transects is, however, an extremely difficult and 
complex problem primarily because of their locomotion. There are two 
aspects to be considered: (1) the purely mathematical problems arising 
in abstract or ideal situations, and (2) the practical and technical 
difficulties encountered in standardizing the field procedure in order 
that a mathematical model be applicable. Though we are here concerned 
essentially with the mathematical aspects, it is nevertheless important 
to bear in mind the nature and magnitude of the various types of error 
arising from practical considerations, and it is emphasized that the 
biology and behaviour of the individuals of the population being esti- 
mated require thorough study before even a rough assessment can be 
made of the applicability of any census method in any particular 
instance. The ornithologist is particularly fortunate in this respect for 
the distribution, movements, and habits of many birds can and have 
been studied sufficiently to permit him to judge beforehand whether to 
count them on their breeding territory, at the roost, or on their feeding 
area, and to select the time when movements into, out, and within the 
study area are most nearly “‘frozen.”’ Even in favourable cases where 
the assumptions underlying a particular method seem well satisfied, a 
decision to employ that method is a judgment which bitter experience 
indicates should be made with reserve, and alternative ways of confirm- 
ing the answer looked for. Moreover, results require careful scrutiny in 
relation to established ecological knowledge. A rough first. check, for 
example, is possible in the ease of birds, where the normal, upper, and 
lower ranges of density in the main ‘habitat’ types (e.g., woodland, 
moorland) are fairly well known, so that any freak figure is immediately 
suspect. 


385 


= 
| 
= 
& 
| 
7 pe 
| 
a 
e 
re 
9. 
“ast 


386 BIOMETRICS, SEPTEMBER 1958 


The question of conspicuousness [Colquhoun, 1940] is indeed an 
important practical one which could render almost any direct census 
method inapplicable. Otters, for example, though larger than almost 
all British birds, so far defy census entirely, though certain insects such 
as dragonflies [Moore, 1953] appear to be suitable material for population 
studies. Awareness of the presence of an organism may arise through 
auditory rather than visual channels. But even with song birds, the 
period of song may be limited to certain phases of the breeding cycle, 
[Buxton, 1950] and may only approximate to continuity at certain 
times of the day. No experienced ornithologist, however, would consider 
taking a count without first obtaining a clear knowledge of seasonal, 
diurnal, and other patterns likely to affect the results, or without guarding 
against the potential errors arising through the inconspicuousness of 
certain species. 

At high densities the counting of moving organisms becomes humanly 
impossible by reason of the failure of the observer to distinguish between 
those organisms which have already been counted and those which 
have not, and to note the presence of the latter before they have moved 
on. Only because birds are so high in the ecological pyramid that their 
densities related to the available foraging area are relatively so low, 
is it normally possible to count them at all, and then not in all cireum- 
stances. Special methods have been devised for enumerating large 
aggregations, one of the best of which is by use of enlarged aerial or 
other photographs taken simultaneously with direct counts. 

Conversely, it has also to be borne in mind at the present stage of 
development of statistical ecology that the framing of a methodological 
basis for animal transect censuses must inevitably start with simple 
ideal situations, where the organisms are perfectly conspicuous and 
do not behave abnormally in the neighbourhood of the observer, and 
that apart from temporary expedients, the modifications and adjust- 
ments which must be made to simple models before they can be employed 
in circumstances where complicating factors cannot be ignored, are 
matters for future research. 

An important point which has to be borne in mind in this as in many 
other fields is the assessment of acceptable margins of error in relation 
to the use to be made of the material, either scientifically or practically. 
For certain purposes very high accuracy is essential or the data are 
valueless, but in other cases even errors of the order of 50% or higher 
may be acceptable provided the existence of such a degree of error is 
plainly understood. 


HISTORICAL 
Interest in census problems has been stimulated greatly by the 


ay 


an 
q 
3 
| 
j 
‘ 
at ‘ « 


1e 


LINE TRANSECTS IN ANIMAL ECOLOGY 387 


efforts made by ornithologists during the last half century, and the 
principal methods devised and difficulties encountered have been 
discussed by Palmgren [1930], Nicholson [1931], Lack [1937], Nordberg 
[1947], and many others. Without doubt, wild bird populations exhibit 
a quite exceptional inherent suitability for the study of direct methods 
of enumeration, for not only is a considerable body of field knowledge 
already acquired, but large enough numbers of observers active in the 
field make it possible to build up a large body of census results. 

Fundamentally, there have been two main procedures. In the 
first, defined areas (or more conveniently, representative samples of 
them) are submitted to a thorough direct enumeration (Probeflaichen- 
methode), and the density estimated in accordance with the definition 
by computing the ratio between the number of separate organisms (or 
sometimes pairs) observed and the area examined. The birds appear 
to be treated in principle almost as if they were stationary, and on 
discovery the usually recommended practice is to record them on a 
map of the area. Knowledge and field experience of bird biology and 
behaviour are brought to bear to avoid multiple recording of the same 
individual or pair. For example, most passerine birds at the onset of 
the breeding season establish more or less well defined territories or 
preferred areas [Hinde, 1952] and in these build their nests. The 
recognition that two appearances of cock birds in the same region 
might have to be attributed to a single individual helps considerably in 
enhancing accuracy. By repeated recording, carried out compartment 
by compartment, and by adopting the convention whereby any observed 
figure is regarded as a lower bound (Minimizahl), ornithologists usually 
consider it possible to assess very roughly the number of birds escaping 
notice in a single attempt at enumeration, and to arrive at a more 
reliable final estimate [Palmgren, 1930]. 

The second procedure (the Transect or Linientaxierungsmethode) 
may be regarded as a development of the first, brought about by choosing 
the sampling area in the form of a long, narrow belt, lying on one o1 
both sides of the path taken by the observer, as he walks [Yapp, 1956] 
or rides [Nicholson, 1931; Southern, 1944] through the region being 
studied. Though this method is often adopted for ease and convenience 
(the first method, for example, being impracticable at sea) and the 
results used primarily for comparative purposes rather than the assess- 
ment of absolute numbers, transect procedures have a special merit in 
that they embrace a wider range of diversity in the habitat under 
general consideration than does a set of compact sample areas of com- 
parable total extent, a result which is usually reflected by the greater 
number of species observed. But as Nordberg [1947] has pointed out, 
it is not permissible to estimate the true population from repeated 


; 
| 
: 
1 
y 
€ 
J 
is 
= 


388 BIOMETRICS, SEPTEMBER 1958 


recordings of the same transect by the arguments and numerical devices 
usually employed in the case of compact areas. The transect belt is in 
fact so narrow that chance variation plays an important role, and a 
source of error arises from accidental encounters with birds nesting 
outside the strip. By dividing the transect into short segments, and 
adopting the largest recorded figure for each, it is by no means certain, 
in the case at least of fully conspicuous organisms, that the density 
might not be seriously over-estimated. 

Though the procedures outlined above do not appear, for under- 
standable reasons, to have been at all rigidly defined, the underlying 
mathematical argument tacitly omits the movement of the organisms, 
this being looked on as a troublesome complication, to be overcome, 
often apparently with considerable success, by sound judgment and 
field experience. The first attempt to remodel the methodological 
basis of transect censuses, so as to take the movements of the organisms 
into account in the calculations, is due to Yapp who early in 1953 in a 
personal communication to the author suggested that the encounters 
between a moving observer and the individuals of a mobile species 
could be likened to the collisions between a molecule of one kind and 
molecules of another, and he proposed making use of the analogy to 
estimate bird populations by applying a formula given by the immediate 
application of the classical kinetic theory of gases. His argument on 
these lines appears as an appendix in Yapp [1955]. On analysis his 
result can be resolved into two formulae, which it is here proposed to 
call Yapp’s first and second formulae: 


(i) D = 2/(2RV), (1) 
where D = density of population studied, 
z = number of encounters per unit time, 


V = average velocity of the organisms relative to 
the moving observer, 
R = range or radial distance within which an or- 
ganism must approach the observer to effect 
an encounter. 
(ii) (2) 
where = average velocity of the organisms, 
= average velocity of the observer. 


a 


In deriving these results Yapp assumed the motion of the organisms, 
like that of molecules, to be rectilinear between encounters. Since to 
many biologists this condition might appear unacceptable in natural 
ecological situations, I examined the model on less restrictive assump- 
tions, allowing the organisms to move in any way whatsoever provided 


| 
a 
ig 
q 


LINE TRANSECTS IN ANIMAL ECOLOGY 389 
that their paths were rectifiable, that is, could be represented to any 
arbitrary preassigned degree of approximation (no matter how close) 
by a chain of minute straight links. The result was that Yapp’s first 
formula was also valid under this general and acceptable condition. 

The second formula, however, is true in the physical 3-dimensional 
analogue, not because of the shape of the paths, but because of the 
peculiar form of the distribution of velocities which molecules enjoy. 
It would also be valid in the biological 2-dimensional case if the organ- 
isms and the observer both satisfied a distribution of velocities which is 
the 2-dimensional analogue of Maxwell’s distribution of molecular 
velocities [1860]. Since this can hardly be expected, Yapp’s second 
formula will not be in general strictly true, though it would appear 
(see later sections) to be adequate in a large number of practical situa- 
tions in which it might be employed. 


YAPP’S FIRST FORMULA 


The applicability of a mathematical model to a biological situation 
resis on the correspondence which can be set up between the simple 
abstract elements of the one and the real and highly complex elements 
of the other, and its value depends on the closeness with which the 
relations connecting the elements of the model hold also for the corre- 
sponding biological components. It is inevitable that the two things 
should be described in different language. However, in the present 
treatment it may perhaps help to remember that the term particle 
corresponds to animal and the term contour to the boundary of the 
field of perception of the observer. 

In a large plane region (see Fig. 1) let there be a large number of 
particles moving about in rectifiable paths not necessarily independently 
(e.g., in groups), and let the average density of the particles be D. An 
observer O moves independently across the region in an equally arbitrary 
manner, his path being termed a transect. Associated with the observer 
is a frame of reference (such as a set of ordinary cartesian coordinates), 
and marked out on this frame is a closed contour. It is not assumed that 
the particles or the observer move with uniform velocity, but only that 
in an infinitesimally small interval of time the velocities may be regarded 
as such. It is also assumed that the behaviour of the particles is the 
same in the neighbourhood of the contour as elsewhere, irrespective of 
local variations in probability density. This condition restricts the 
applicability of the mathematical model to biological situations where 
the organism being counted is indifferent to the presence of the observer, 
and would probably not hold in the case of timid birds or attracted 
tsetse flies. For the present it is assumed that all parts of the area are 


i 
,. 
| 
il 
d 


390 BIOMETRICS, SEPTEMBER 1958 


a oF the 


FIGURE 1. 
AN OBSERVER’s SAMPLING REGION 


equally likely either to contain particles or to be visited by the observer. 
More strictly stated, the probability density is taken to be uniform for 
all transects or segments of the same transect planned without prior 
knowledge of the spatial distribution of density in the area. 

In this section of the paper the velocities of the particles, both in 
magnitude and direction, are measured relatively to the observer, 
that is, on the imaginary frame of reference which he carries about 
with him. It may be convenient for ecologists not accustomed to the 
idea of relative velocity to think of the framework of reference as-a 
material surface on which the particles trace out a permanent record 
of their movements, and that we are concerned with the directions and 
rates at which these tracks are being added to. Thus at time ¢ a partic- 
ular particle may have a relative velocity with direction 6(t) and mag- 
nitude v(t). 

Let the particles be classified at time ¢ on the basis of the magnitude 
of their relative velocities and on the relative direction of their motion, 
so that f(v, 6, t) dv d@ represents the proportion at time ¢ having speeds 
in the elementary interval v + 3dv and directions in 6 + 3d. 


of teference 
XY 
) 
\ 
\ ste 
+ i - 
1 i 
é 4 
| 
Sterting point 
ot time f, 
| 
4 
| 
| 
| 
| 


LINE TRANSECTS IN ANIMAL ECOLOGY 391 


@ position at time t 
° - tedt 


\ 


FIGURE 2. 
Tue MotTIoN oF PARTICLES 


In the following treatment it is assumed that the contour encloses 
a convex figure, though the results with appropriate modifications can 
be extended to figures with concave indentations of the boundary. 

Let two parallel tangents to the contour having direction @ be drawn 
(Fig. 2) and let the distance between them be H(@). Let a border of 
width v(t) dt be marked off and shaded as in the figure. 

Then any particle of the class with speed v + }dv and direction 
6 + 3dé@ will cut the contour (from without) in the elementary interval 
of time from ¢ to t + dt if and only if it lies in the shaded border at time t. 
Since the area of this band is H(6)v(t) dt, the expected number of par- 
ticles of the stated class lying in it will be Df(v, 6, t) dv d@H(@)v(t) dt. 

The expected total number cutting the contour from without in 
the minute interval of time dt is given by integrating all classes, that 
is over all values of v and 6; and the expected total number &{n} doing 
so in the whole interval from T, to 7, by integrating over ¢. Thus 


Bin} erry = [ jv, 0, \u(t)H(6) de dv dt. 


|. 
| 
2. 
1 
ia 
43 
e 
1, 


392 BIOMETRICS, SEPTEMBER 1958 


The Parameter H 


When the contour is not a circle and it is impracticable to evaluate 
the integral in (3), it is, nevertheless, possible to fix upper and lower 
bounds to it by employing the mean value theorem. 

Because f, v are both positive, we may write 


&{n} = DH [ [ jv, 0, v(t) dv dt, 


where H is some value intermediate between the least and greatest 
possible values of H(@). 
Since the mean value of v for the period involved is by definition 


we obtain &{n} = DHTV where T = T, — T,. 

In the special case where the contour is a circle so that H = 2R, 
this reduces in form to Yapp’s first formula D = 2z/(2RV), where, 
however, z2 = &{n}/T may be interpreted as the expected value of 
n/T, the average rate of occurrence of encounters for the period con- 
cerned. 

In general, this result should be more than sufficient for normal 
ecological practice, when we consider the very considerable technical 
sources of error involved even in determining the shape and extent of 
the contour. 

In the particular case where the observer is at rest and it is reasonable 
to rule out the existence of specially favoured directional movement on 
the part of the particles, f(v, 6, ¢) is independent of 6. Equation (3) 
then reduces to the product of two integrals and 


1 
H = H(6) do. 


By a result due to Legendre (see Edwards I, p. 543 §532), it is known 
that the perimeter s of a closed oval of continuous curvature is given by 
s = J?" p(@) dé, where p(6) is the length of the perpendicular from a 
fixed point within the figure to a tangent with direction @, this angle 
taking on successive values from 0 to 27 as the tangent “rolls” round 
the figure back to its initial position. It follows immediately that 
H = s/z, a result which may also be derived from first principles, as an 
alternative to the method used above, by considering the flow of par- 
ticles across an element ds of the contour. 

When the observer moves sufficiently rapidly, the particles will 
almost invariably cut the contour at the observer’s front, so that the 


— 4 
i 
| 


LINE TRANSECTS IN ANIMAL ECOLOGY 393 


appropriate value of H for this limiting case is the distance //, between 
tangents drawn parallel to the direction of motion, 

Jn general // lies between s/x and //, according to the relative 
magnitudes of the velocities of particle and observer. 


The Effect of Rotation 


When the observer turns round with his frame of reference, the 
relative velocity v(t) of a particle is usually increased, especially if 
located at a considerable distance from the observer. For an observer 
rotating at a fixed spot, the relative motion of a fixed particle is in a 
circle. Particles, which otherwise would never be encountered, will, if 
sufficiently near to the observer, cut the contour one or more times in 
every complete rotation. The only exception arises when the contour 
is itself a circle with the observer at the centre, and then rotation has 
no effect whatsoever on increasing the number, of encounters, for in 
this case we need not rotate the frame of reference associated with the 
observer but merely allow the contour conceived as a material hoop to 
rotate instead. 

If the shape of the contour differs considerably from that of a circle 
with centre the observer, as would happen in purely visual studies 
where also rotation of the eyes and head would be inevitable, it would 
appear that an element of commonsense judgment is called for and 
assumed on the part of the observer in order that he can discount 
repeated encounters of the same particle due solely to repeated scanning 
of his immediate environment. If a commonsense continuity principle 
is adopted, the field of perception is not that given instantaneously but 
that synthesized over a short period of time from a rapid succession of 
observations covering a wide range of directions. It will in general 
have a more circular character than before, the possible deficiency in 
the rear being compensated for by the greater attention given ahead. 
Even so, the observer would not be at the centre of the field, and rapid 
changes in his direction of movement might be a serious source of 
complexity. Much of the error from this source could however be 
avoided fairly easily by suitable definition of the transect procedure 
so as to exclude marked changes in direction. ‘This weuld certainly be 
better than attempting to incorporate the effects of rotation into the 
general formulation. Henceforth it will be assumed that rotation has 
been excluded. 


of Heterogeneity 


Consider what happens in a heterogeneous region, and for simplicity 
but without loss of generality, suppose that the area can be subdivided 


a 
4! 
| 
e 
n 
| 
n 
a 
le 
id 
at 
An 
rill 
he 


394 BIOMETRICS, SEPTEMBER 1958 


into a large number (N) of sub-areas in each of which the earlier stated 
conditions hold. Then 


&{n} = sin , (5) 


where 7’; denotes the expected amount of time spent by the observer 
in the jth sub-area. 

If the values of D;, H;, T;, y, vary from place to place in such a 
manner that their distributions are statistically independent, the mean 
value of their product is the product of their mean values, and (5) 
reduces to 


&o{n} = DHTV, where T = DCT;. (6) 


If, however, in regions in which the observer could only move slowly 
there were strong tendencies for the particles to be more dense and more 
active and for the contour to expand, formula (3) would give too small 
a value to the expected number of encounters. 

The ratio between (5) and (6) is, of course, independent of the units 
in which the parameters D;, H;, T;, V; are measured and, though 
variable, these quantities cannot be negative. It is perhaps surprising, 
however, that even with considerable heterogeneity and marked cor- 
relation between them, the error involved is often not great, particularly 
if the correlations are not all of the same sign. 


YAPP’S SECOND FORMULA 


In this section we are concerned with the derivation of the mean 
relative velocity V, given information about the absolute velocities 
(u, w) of the particles and of the observer respectively and with the 
acceptable assumption that the directions of the particles are randomly 
and uniformly distributed with respect to the direction of motion of 
the observer. 


A General Result 


If we are given two fixed velocities of magnitude u and w, and if 
6 is the angle between them, the magnitude of the relative velocity is 
given by forming the triangle of velocities and applying the cosine rule. 


v =u? + — 2uw cos 8. (7) 


If now u, w, and @ are variable, and if the distribution of @ is inde- 
pendent of uw, we have, on taking expectations in (7), 


&{v"} = &{u*} + 


le. 


LINE TRANSECTS IN ANIMAL ECOLOGY 395 


since 
cos 0} = {cos 6}, 


and 


cos = 0. 


Now for any random variable z, &{2°} = @ + var 2, so that we obtain 


+ varu + var w — (8) 


in contrast to Yapp’s formula 
(9) 


If wu and w are constant, it is clear that the second formula (9) gives a 
value in excess of the true value (8). 


Case of Velocities of Fixed Magnitude 


The mean value of the relative velocity for vectors u and w of fixed 
magnitude is 


vd@ where v = + w* — 2uw cos 6)'”. 
0 


By symmetry and the substitution cos @ = 1 — 2 sin’ 3¢, the integral 
readily reduces to 


This can be evaluated immediately using tables of the elliptic integral 
[Legendre, 1825], 


2 
E. AW) = (1 — sin? y sin? dg, 


0 

by setting 

¥ = are sin [2 (uw)'”/(u + w)]. The values of this exact solution 
(V) and those of Yapp’s formula (V,) are tabulated below for a series 
of values of p = u/w. It is easily proved analytically that the greatest 
error occurs when p = 1 (and is then about 10%) and that the error 
progressively decreases to zero as p moves away from this value. It 
will be apparent from the table that, for values of p very small or very 
large compared with unity, the error involved is almost negligible 
when considered in relation to other sources of inaccuracy. 


a 
: 
4 
+ 
4 
n 
3% 
S 
y 
if 
| ed 
is 
le- 
: 
; 
® 


396 BIOMETRICS, SEPTEMBER 1958 


TABLE 1 
Tae Recarive Sizes ov (1/28) (at 2a eos de 
AND Vo = (a2 -+ w*)!? vor Various VALUES OF p = u/w 


p 
1/p © 10 


V/Vo 1 1.0 .99 .98 .97 .95 .94 .92 .91 .90 .90 


An Important Special Case 

The analogue of Maxwell’s distribution appropriate to the present 
2-dimensional problem is dF(u) = e~““’ 2cu du, where dF(u) denotes 
the probability that a particle picked at random has a speed in the 
elementary interval u + }du, and c is a parameter whose value depends 
on the nature of the particles and on the conditions. The mean velocity 
is 


&{u} | Qeu du. 


The substitution 8 = cu’ then gives 
ds = 5 (7 (11) 


Tor comparison with a later result we need to know the form of the 
moment generating function of u’. By definition this is 


= = [ du 
(12) 


o -1 
= da = (1 ‘) 
0 c 


We now show that, if the observer’s velocity w also varies in the above 
way but with a different parameter value (k), then the resultant velocity 
v has the same kind of distribution but with parameter kc/(k + c). 
This result implies [using (11)] that a, a, V are proportional respectively 


from which it follows immediately that V? = a@ + «@’, which is Yapp’s 
second formula. 


* a: 0.3 0.4 0.5 06 0.7 08 09 1.0 
24 8 17.14 439. 14 42 
> 
1 
to 
| | 


LINE TRANSECTS IN ANIMAT. RCOLOGY 397 


To prove the statement, we deduce the moment. generating function 
of the distribution of 4 — 2uw cos 6 and show that it tas 
the same form as (12). ‘This function is 


= &{e'""} 


| Qr A 
24 wt- Quweos -kw? 2 
TJo Jo Jo 


Now 


1,\2n 
5 [ exp (+ cos 6) d@ = I,(q) = >> Gq”, (14) 
0 


At n!n! 


Integrating (13) with respect to 6, using (14), now gives 


n=0 0 n! 0 n! 
(15) 
= — oe — ory 
n=0 


where in the first line we have twice used the result 


ex" dx = 
0 


and in the second have simply summed the geometric series. 

This remarkable result, that the relative velocity has a probability 
distribution with the same general mathematical form as that of the 
components, does not appear to be true in any other cases I have studied. 


THE ESTIMATION OF DENSITY 


Formula (4) provides us immediately with an unbiased estimator of 
® when V, H, T are given, namely: ® = n/(HT'V), where n denotes 
the actual number of encounters in the course of a transect. For every 
observed value of n, we can compute a corresponding value of ), and 
it is clear that the variance of is proportional to that of n. The 
basic question, therefore, is—‘‘What is the variance of n?”, a question 
that can be approached both theoretically and practically. 

The obvious practical course would be to repeat the experiment a 
number of times, or, if this were not permissible, to conduct the experi- 
ment so as to be able to split the transect up into a large number of 
separate pieces and then to apply the principle that the variance of the 
mean of vy independent values is 1/y times the variance of a single value. 
The latter can be estimated directly from the data in the usual wiy. 
Since it is based on vy — 1 degrees of freedom, it will not be very reliable 


: 
* 
13) 
| 
’ 


398 BIOMETRICS, SEPTEMBER 1958 


if vy is small. But the important thing is not to know this variance 
with high accuracy but merely to use it to assess the order of reliability 
that can be attached to a particular estimate of density, a consideration 
which often has to be faced at the planning or preliminary stage of a 
census before the results of replicate transects are available. 

A rigorous theoretical approach to the problem is not only difficult 
but appears to call for the development of new mathematical tools. 
Whereas the expected number of encounters does not appear to depend 
on the shapes of the paths, the variance of the number of encounters 
does. Nevertheless, by theoretical argument it is possible to conjecture 
the general character of the result. It seems that, if the particles sweep 
across the observer’s space without any special tendency to double 
back on their tracks or to execute oscillatory movements, and as it 
were to become merged into the general picture, then the number of 
encounters for a fixed interval of time will be a Poisson variate. The 
variance and mean are then equal. If the particles are aggregated in 
groups, say of size g, then the variance will be g times the mean. Any 
complicated folding back of the particles on their tracks will have an 
effect on the variance equivalent to that of aggregation. Again, hetero- 
geneity (local variations in probability density) will increase the magni- 
tude of the sampling variance. 

In order to provide concrete support for the theoretical formulae 
given earlier and the conjectures outlined above, it was thought desirable 
to set up laboratory experiments on Monte Carlo lines. It was necessary, 
of course, to represent the continuous motion of the particles and the 
observer by somewhat abrupt movements in discrete time. The 
scheme adopted was perhaps the simplest approximation possible. The 
particles were represented by coloured pins randomly distributed on a 
triangular lattice provided by large sheets of triangular graph paper. 
From any point on the lattice, six directions were possible, and the 
directions of movement of the particles were determined by the random 
procedure of throwing dice. By assigning fresh random directions to 
the particles individually after each move, the latter assumed the 
irregular paths required of them. The shape of the contour was chosen 
to be hexagonal and its size was such that the contour passed midway 
between the lattice points just inside it and those just outside. The 
path of the observer was determined arbitrarily in advance well within 
the confines of the whole field. In order to maintain a constant overall 
density of particles, the convention was adopted that any particle 
leaving the area automatically re-entered at a corresponding point on 
the opposite side. The magnitude of the velocity of the observer was 
taken as two units and that of the particles as one. 

There were two ways of counting the number of encounters per unit 


4 
— 
ag 


LINE TRANSECTS IN ANIMAL ECOLOGY 399 


of time. In method I, after each completed move counts were made 
of the particles now inside but previously outside. Since in a continuous 
system it is possible for a particle to move in and out again in any 
finite interval of time, this method of counting apparently leads to 
underestimates. In method II, each completed move was considered 
in two parts: first the pins were moved and a count taken; then the 
observer moved and a second count taken. This method allows for 
the recording of encounters which in a continuous system need not 
necessarily occur, and therefore yields over-estimates. Though the 
purpose of the experiment was primarily to assess the order of magnitude 
of the variance, and could hardly be expected to yield more than a 
crude estimate of the density, it is of interest to note that whereas the 
true value of D was 0.08, the estimates obtained by the application of 
the appropriate formulae (already derived on the basis of a continuous 
model) to the observed results were for the two methods of counting 
0.072 and 0.099 respectively, the standard errors of these figures being 
estimated from the data as approximately 0.007 and 0.010. 

Several series of short transects were made and for each series the 
ratio of the observed value of the variance to the mean number of 
encounters was calculated. The results are summarized below. Statis- 
tical analysis confirms the conjectures, (1) that the variance and mean 
are of the same order of magnitude, and (2) that with moderately irreg- 
ular paths, such as those occurring in these experiments (shape being 
considered in the space of the observer), the ratio actually exceeds 
unity. 


TABLE 2 
SuMMARY OF EXPERIMENTAL RESULTS 


Ratio of Variance to Mean 
Duration Number — 

of of Analogue I Analogue II 
Transect Replicates — 


Series A Series B Series A Series B 
3 moves 40 1.16 1.09 1.32 1.57 
6 moves 20 1.59 0.93 1.98 1.42 


It follows from the proportionality between © and n that 


s.d. of D s.d. of n 


where Q? = (16) 


a 


D Sin} Vein}! 


In practice &{n} can be replaced by the observed value n, and, from the 


7 
ake 
| 
qe? 
i 
= 
ie 
1e # 
in 
| 
il | | 
‘le 
: 
nit 
| 


400 BLOMIVTRICS, SEPTEMBER 1958 


argument given above, Q can be regarded as a number in general some- 
what greater than unity, its value depending on the degree of aggregation 
of the moving particles or the character of their paths or the hetero- 
geneity of the area. Even where Q cannot be assessed precisely, it may 
nevertheless be possible to set some reasonable bounds to it and thereby 
set bounds to the estimates of density. 

Mathematically speaking //(or 2R) can be chosen arbitrarily. It is 
at the disposal of the observer to be used as he finds most convenient. 
Other considerations being equal, the larger H, the greater & {n}and the 
greater the precision of ©. The quantity V is here treated as a parameter 
regarded as calculable once adequate information about the velocities 
of the particles and the observer is given, though the actual estimation 
of these directly raises practical problems of considerable magnitude. 
The effects of errors in the use of the contour or in the estimation of V 
are not, of course, included in formula (13) above. 


ACKNOWLEDGMENTS 


I am grateful to Mr. E. M. Nicholson for the benefit of his views on 
the broad practical issues, and to Miss J. R. Proctor for the careful 
execution of the laboratory experiments. 


REFERENCES 


Buxton, J. [1950]. The Redstart. London: Collins. 

Colquehoun, M. K. [1940]. Visual and auditory conspicuousness in a woodland bird 
community: a quantitative analysis. Proc. Zool. Soc. London, A, 110: 129-48. 

Edwards, J. [1921]. The Integral Calculus. I. Macmillan. 

Lack, D. [1937]. A review of bird census work and bird population problems. 
Tbis: 369-395. 

Legendre, A. M. [1825]. Traité des Fonctions Elliptiques, 2. Paris. Tables re-issued 
with an introduction by K. Pearson [1934]. Cambridge University Press. 

Maxwell, J. C. [1860]. [Illustrations of the dynamical theory of gases. Part I. 
On the motions and collisions of perfectly elastic spheres. Phil. Mag. [4], 19: 
19-32. 

Moore, N. W. [1953]. Population density in adult dragonflies (Odonata-anisoptera). 
J. Anim. Ecol. 22: 344-359. 

Nicholson, E. M. [1931]. The Art of Bird Watching. London: Witherby. 

Hinde, R. A. [1952]. The Behaviour of the Great Tit (Parus major) and some other 
related species. Behaviour. Supp. II. x + 199 pp. 

Nordberg, Sven [1947]. Ein Vergleich zwischen Probeflichenmethode und Linien- 
taxierungsmethode bei quantitativen Aufnahmen des Vogelbestandes. Ornis 
fennica 24: 87-92. 

Palmgren, P. [1930]. Quantitative Untersuchungen tiber die Vogelfauna in den 
Walden Sudfinnlands, mit besonderer Beriicksichtigung Alands. Acta Zool. 
Fenn. 7: 1-218. 

Southern, H. N. [1944]. A transect census of pigeons. J. Anim. Ecol. 44: 134-139. 

Yapp, W. B. [1955]. The theory of line transects. Bird Study 3: 93-104. 


| 
i 
i 


MEASUREMENT ERRORS ASSOCIATED WITH OBTAINING 
ACREAGE ESTIMATES OF COTTON FIELDS 
Jack Fieiscuer, Dantet G. Horvrrz, J. Airtu, AND A, L, FINKNER 


North Carolina State College 
Raleigh, North Carolina, U.S.A. 


Introduction 


Respondents’ answers to queries by interviewers may be in error 
due to the manner in which questions are worded, how they are asked 
by the interviewers, and how the questions are interpreted by the 
respondent (cf. Deming [1950]). These errors we choose to combine 
into one classification which we refer to as ‘‘measurement errors” as 
Cochran [1953] has done. Hansen, et al. [1951] have written compre- 
hensively about these types of errors. Both of these references have 
recommended that surveys investigating measurement errors be reported 
to accumulate information on their magnitude and effect. Of particular 
interest, among previously reported research in this area, is the work of 
Kish and Lansing [1954] on the estimation of value of homes and that 
of Airth [1955] on measurement errors in estimating field acreages of 
cotton during the crop year 1954. This paper combines the work 
done by Airth with results of surveys on the same subject in the crop 
year 1955. 

The research reported here was part of an overall project aimed at 
helping the Agricultural Estimates Division’ of the USDA improve 
their estimates of cotton acreage and forecasts of cotton production. 
The specific objective was an investigation of methods for determining 
acreage of cotton planted for the July 1 estimate made by the I'ederal 
Crop Reporting Board. In order to investigate different methods, 
measurements on individual fields were considered. 


1The research was sponsored jointly by the Agricultural Marketing Service (AMS) and the Institute 
of Statistics, North Carolina State College. Jack Fleischer is a member of the Raleigh Statistical 
Laboratory of AMS and on the faculty of the Department of Experimental Statistics, North Carolina 
State College. D. G. Horvitz, now with A. J. Wood and Company, was formerly on the faculty of 
the Department of Experimental Statistics, North Caroliua State College. J. M. Airth, now with 
the Department of Agriculture, Dominion of Canada, was on leave from Canada and engaged in 
studies leading to the Master's degree in Experimental Statistics at North Carolina State College. 
A. L. Finkner is supervisor of the sampling group on the faculty of the Department of Experimental 
Statistics, North Carolina State College. 


401 


Me 
| 
[a 
4 
1 
) 
or 
en 
ol. 
= 


402 BIOMETRICS, SEPTEMBER 1958 


In 1954, an area probability sample was selected in three counties in 
the Southern Piedmont area of North Carolina. Areas were selected 
using clusters of cotton fields as sampling units. This method of cluster 
sampling is called closed segment sampling because the unit of observation, 
e.g. the cotton field, is always located within the boundaries of the area 
segment. The 22 sampling units selected contained 60 cotton fields and 
three methods of estimating the field acreages were employed. They 
were: 


(1) chain measurements 

(2) farmers’ estimates and 

(3) planimeter measurements of field boundaries delineated: on 
aerial photographs. 


In 1955, the universe was enlarged to an 11 county area which 
comprises the eighth crop reporting district in the Southern Piedmont 
area of North Carolina, so that estimates of acreage and forecasts of 
production could be made and compared with those made by the State 
Crop Reporting Service for the same area. Closed segment sampling 
was again employed. This year 40 sampling units contained 100 cotton 
fields and four different methods of estimating the field acreages were 
employed, including the three methods employed in 1954 plus rotometer 
measurements on aerial photographs. 


The Measurements 


During both years, chain measurements were made by teams of two 
men each. The boundaries of the field were sketched on a record form 
and divided into areas representing figures such as trapezoids and 
triangles. Enough measurements were made to compute the area of 
the figure, e.g. the base and altitude of a triangle. The dimensions 
were recorded on the sketch and the areas computed in the office. In 
1954, several fields were remeasured to get estimates of the variance 
between teams measuring the same fields and the variance between 
measurements made by the same teams. 

Farmers’ estimates of the acreages of the selected fields were obtained 
during personal interviews in July 1954 and in June 1955. Both surveys 
were conducted prior to other measurements made in the sample fields. 
Since farmers’ estimates of field acreages are relatively inexpensive to 
collect, such estimates were obtained for all the cotton planted by each 
selected farm operator as a single figure and for each of his individual 
fields, including those in the sample area segments. 

Field boundaries were delineated on aerial photographs, scaled 660 
feet to the inch, in 1954. It was possible to obtain prints scaled 330 feet 


be 
- 


COTTON ACREAGE ESTIMATES 403 


to the inch in 1955 and the larger scaled aerial photographs were used 
then. The field areas were determined by use of a planimeter, a measur- 
ing instrument which records square inches on a small wheel when the 
circumference of the area is traced. Square inches are converted to 
acres according to the scale of the aerial photograph. In 1955, in 
addition to the planimeter, a rotometer was also employed to estimate 
acreage. This measuring instrument works on the principle of measur- 
ing parallel lines within the field boundaries and a reading in square 
inches is made on the circumference of the instrument. Conversion 
from square inches to acres is again accomplished according to the 
scale of the aerial photograph. 


The Means and Biases 


Chain measurements were considered the most accurate and objective 
of the measuring schemes. This method was regarded as an unbiased 
technique and considered as the basis for comparing the other measuring 
techniques. In both years the least biased method compared with 
chain measurements was the farmers’ estimates. In 1954, for 60 fields, 
the farmers underestimated their cotton acreage by a little over 1%, 
while in 1955, for 99 fields, the bias was negligible (one one-hundredth 
of 1%) as seen from Table 1. 

The methods with the largest biases were those involving instrument 
measurements on aerial photographs. In 1955, it was considered that 
an optimum job in photograph delineation and instrument reading was 
done. Larger scaled photographs were used to alleviate the problems 
that occurred in 1954 concerning delineation of the field boundaries. 
For planimetering, the bias was 3.29% compared with chain measure- 
ments on 99 fields in 1955. In 1954, the bias was 7.44% for 60 fields 


TABLE 1 


CoMPARISON OF FreLD MEANS AND PER CENT oF Bras ror Cotton ACREAGE 
DETERMINED BY DIFFERENT METHODS OF MEASUREMENT FOR Two YEARS 


No. of Farmers’ 

Fields Chain Estimate Planimeter | Retometer 
Means, 1954 60 3.9692 3.9258 4.2646 
Means, 1955 99 4.2061 4.2056 4.3398 4.4134 
Biases in Per 
Cent, 1954 60 0 —1.09 7.44 
Biases in Per 
Cent, 1955 99 3.29 


4 
| | | 
Ss. 
al 
= 


404 BIOMETRICS, SEPTEMBER 1958 


but the job of delineating boundaries on the photographs was not as 
accurate. The rotometer measurements reflect the difference in instru- 
ments; this instrument consistently gave higher readings than the 
planimeter for the same area. 


Error Component Models 


The models which follow were used in the analysis of the data. The 
estimate of cotton acreage for a segment 7, using chain measurements 
which are assumed to yield the true cotton acreage for the segment, is 
represented by the following model: 


where yu is the mean cotton acreage of all segments in the population 
and 7; is the deviation of the true cotton acreage of segment 7 from yz. 
In 1954, several of the fields were remeasured by different teams and 
some of these fields remeasured by the same teams. An analysis of 
variance furnished an estimate of the variation between measurements 
of the same field. This component was less than 1% of the magnitude 
of the field to field variance. Therefore, for estimating total acreage, 
one chain measurement per field was considered sufficient. 

For the other methods of measuring the cotton acreage in segment 1, 
the model includes a term, 6; , for bias and is represented by: 


The term 8; may contain three components: 


(1) an overall bias 8 common to all fields measured by this method, 

(2) a bias in measuring an individual field, whose magnitude and 
direction may vary from one field to another and, 

(3) a random reporting or measurement error. 


Since the data do not permit separate evaluation of components (2) and 
(3), the term £8; will be referred to, for simplicity, as a bias. The 7; 
and 6, are assumed to be correlated, so that the mean square error of 
the sample mean of any measurement technique for n segments selected 
at random is: 


MSE(@) = + + 2 cov (67)|/n + 8’. 


The component o? represents pure sampling variance. The remaining 
components arise out of reporting or measurement biases. 

Using the sample data for each year, unbiased estimates of the various 
error components were computed as follows: 


i 


ing 


ous 


COTTON ACREAGE ESTIMATES 405 


(i) 7 — & estimates B, 

(ii) s? estimates , 
(iii) s? estimates o2 + 03 + 2 cov (87), 
(iv) S,, estimates o? + cov (87), and 


(v) G - estimates 


The last result is proved in the Appendix. 


Analysis of Error Components 


In Table 2, the error components estimated from the sample data for 
the two years are presented along with the means and biases. The 
sampling variances are greater in 1955 because a much larger area 
universe is covered; consequently, the range of cotton acreage per 
segment is greater. The estimates of 6°(b°) are very small, the largest 
is 0.7% of the estimated mean square error (MSE) for planimeter 
measurements in 1954. The estimates for 3 and 2 cov (7) are exactly 
the same in sign for the two years. There are negative covariances 
for farmers’ estimates with chain measurements both years. 

The estimated variance for farmers’ estimates is slightly less than 
the estimated variance for chain measurements. b” is negative for both 
years and considered to be zero. The variance of the individual biases 
s? was only 2.55% of the total variance s? in 1954 and 2.25% of the 
total variance in 1955. The covariance term is larger in 1955 but 
indicates in both years a tendency for farmers to underestimate the 
acreage in large fields and overestimate acreage in small fields. It is 
interesting to note the closeness of these estimates for the different 
years although the second year data refer to an enlarged universe. 

For planimeter measurements in 1954, the variance of the individual 
biases is 1.25% of the total variance, but the covariance term is 12.67% 
of the total variance. This indicates some tendency for planimeter 
measurements to undermeasure the acreage in small fields and over- 
measure the acreage in large fields. The improvement in the planimeter 
measurements for 1955 is reflected by the components. The variance 
of the individual biases is only 0.20°% of the total variance and the 
covariance component is 6.15% of the total variance. While the co- 
variance term is reduced to 63°% of the 1954 estimate, it is apparent 
that positive covariance is inherent in planimeter measurement. The 
results of the rotometer measurements in 1955 produced estimates of 
error components that are slightly larger in every instance than the 
results for planimeter measurements. 


og 
{ 
| 
| 
| 
12) 
d 
Ti 
of 


BIOMETRICS, SEPTEMBER 1958 


TABLE 2 


EstimaTep Means, Biases, AND VARIANCE COMPONENTS FOR DIFFERENT METHODS 
or MEASURING ACREAGE COMPARED WITH CHAIN MEASUREMENTS FoR 1954 
AND 1955 Surveys ConpucTEeD IN Nortu CAROLINA 


No.of Bias Variance 

Method |Units| Mean  (b) MSE* 2? s: 8; 8 2 Cov(6r) 
1954 
per s.u. 

(60 fields) 
Chain 22 |10.825 —-—— 68.10 ———_- 68.10 68.10 ——- 
Farmers 

Est. 22 |10.707 —0.118 67.83 — .064**| 67.83 68.10 1.73 —2.00 
Planimeter | 22 {11.631 0.806 79.71 .604 79.11 68.10 0.99 10.02 
1955 
per 8.u. 

(99 fields) 
Chain 40 |10.410 95.55 ———— 95.55 95.55 —— ——— 
Farmers 

Est. 40 |10.409 —.001 92.18 —.052**| 92.18 95.55 2.07 —5.44 
Planimeter | 40 {10.752 .342 | 102.14 =.112 102.03 95.55 0.21 6.27 
Rotometer | 40 |10.930 .520 | 104.89 .262 | 104.63 95.55 0.33 8.75 


*Estimated mean square error of y for chain measurements and estimated mean square error of x 
for all other measurements. (Estimated mean square error of y equals 87°). ean 
**Where §* is estimated negative it is considered to be zero for computing MSE(z). 


Conclusions 


From the analysis of the sample data for the two years, it is apparent 
that a practical and accurate means of obtaining estimates of cotton 
acreage planted as of July 1 is from the farmers themselves. This 
method is relatively free of bias for this population, but for crops not 
under allotment, the farmers may not be able to report field acreages as 
accurately as for crops with allotments. It may then be necessary to 
have a check on farmers’ acreage estimates with some objective measure- 
ment. Even for allotment crops it would be well to do some objective 
measuring for other populations. 

It should also be pointed out that using estimates made subjeciively, 
although practically unbiased for two years, may not always lack bias, 
even for the same population. ‘That is the reason for suggesting some 
objective measuring scheme for all acreage measurements. 


a 
| | 
‘ 


COTTON ACREAGE ESTIMATES 407 
Appendix 

Since it follows that 


EG — = + [EG — 
= + 
REFERENCES 
Airth, John Malcolm [1955]. A comparison of four methods of measuring acreage. 
Master’s Thesis, N. C. State College. 
Cochran, William G. [1953]. Sampling Techniques. New York: Wiley, Chapter 13. 
Deming, William E. [1950]. Some Theory of Sampling. New York: Wiley, Chapter 2. 


Hansen, Morris H., et al. [1951]. Response errors in surveys. Jour. Amer, Stat. 
Assoc. 46: 149-190. 


Kish, Leslie and Lansing, John B. [1954]. Response errors in estimating the value 
of homes. Jour. Amer. Stat. Assoc. 49: 520-538. 
Miravalle, Sarah J. [1956]. A comparison of alternative methods of defining and 


allocating area sampling units for agricultural surveys. Master’s Thesis, 
North Carolina State College. 


} 
cs 
My 
Pie 
7 
‘ 
> 
) 
3 


A SEQUENTIAL MULTIPLE-DECISION PROCEDURE 
FOR SELECTING THE BEST ONE OF SEVERAL NORMAL 
POPULATIONS WITH A COMMON UNKNOWN VARIANCE, 
AND ITS USE WITH VARIOUS EXPERIMENTAL DESIGNS’ 


Rosert BECHHOFER 


Department of Industrial and Engineering Administration 
Sibley School of Mechanical Engineering 
Cornell University, Ithaca, New York, U.S.A. 


1. INTRODUCTION 
1.1 Preliminary remarks 


The scientist often is faced with the problem of designing an experi- 
ment which has as its goal the determination of conditions which will 
maximize the mean value of some response. For example, a metallurgist 
may wish to determine which of several types of alloys will produce the 
highest mean tensile strength, an ordnance engineer may wish to 
determine which of several lots of projectiles will produce the longest 
mean range, or an agronomist may wish to determine which of several 
varieties of grain will produce the highest mean yield. All of the above 
problems are alike in the following important respect: the factor under 
consideration (type of alloy, lot of projectiles, variety of grain) is qualita- 
tive. They are to be differentiated from problems in which the factor 
under consideration is quantitative. (If the agronomist wished to deter- 
mine for a. given variety of grain which rate of application of a particular 
fertilizer will produce the highest mean yield, then the factor under 
consideration is quantitative.) In multifactor experiments it is possible 
to have various combinations of qualitative and quantitative factors. 
For example, in two-factor experiments both of the factors can be 
qualitative, one can be qualitative and one quantitative, or both can 
be quantitative. In general, in a p-factor experiment, p + 1 combina- 
tions are possible. 


1This paper is a revision of one which was presented before the Symposium on Design of Industrial 
Experiments, Raleigh, North Carolina, November 9, 1956. The earlier version appears in the Pro- 
dings of the symposi This research was supported by the United States Air Force through 


the Air Force Office of Scientific Research of the Air Research and Development Command, under 
Contract No. AF 18(600)-331. Reproduction in whole or part is permitted for any purpose of the 
United States Government. 


408 


| 
| 
| 


A SEQUENTIAL MULTIPLE-DECISION PROCEDURE 409 

It the scientist is interested in the determination of conditions 
which will maximize the mean value of some response, his choice of a 
statistical procedure will depend on the particular combination of 
qualitative and quantitative factors in his problem. Several procedures 
have been proposed to be used when all of the factors are quantitative; 
however, we do not propose to discuss these procedures here. The 
procedures that we will discuss in this paper are to be used when all of the 
factors are qualitative. (They also can be used when some or all of the 
factors are quantitative, and the quantitative factors are fixed at levels 
which are the only ones to be considered.) For simplicity of exposition 
the emphasis of this paper will be on single-factor experiments; however 
some comments concerning multifactor experiments will be made in 
Section 5. 

It might be appropriate at this point to mention that none of the 
so-called tests of homogeneity (such as the Analysis of Variance test 
that k population means are equal) is appropriate for this problem, for 
in our formulation the scientist knows a priori that the hypothesis of 
homogeneity is false. Clearly, if such is the case, there is no justification 
for conducting the test. The question that really is in the back of the 
scientist’s mind is: ‘Since I am reasonably certain that the populations 
under consideration are not homogeneous (that is, are different insofar 
as the parameter of interest is concerned), is it possible for me to say 
which one of the populations is in some sense best?” 


1.2 Relation of this paper to previous papers 


This paper is one in a series which deals with the problem of selecting 
the best” one of several normal populations. In all, four multiple- 
decision procedures using the same general approach have been proposed 
in these papers. In this paper we shall refer to these procedures as: 


Procedure A: A single-sample procedure which can be used when the 
populations have a common known variance. 

Procedure B: A two-sample procedure which can be used when the 
populations have a common unknown variance. 

Procedure C: A sequential procedure which can be used when the 
populations have a common known variance. 

Procedure D: A sequential procedure which can be used when the 
populations have a common unknown variance. 


This paper is concerned principally with Procedure D (and to some 
extent with Procedure C since the latter is a special case of the former). 


2Goodness” is measured here in terms of the populations means, the best population being the 
one with the largest population mean. 


= 
sft 
4 
Fat 
4 
> 
l 
= 
h 
e 
ies 


410 BLOMETRICS, SEPTEMBER 1958 


Procedure A is described in detail in [1]° which also contains tables 
for applying it; an expository account of its use is given in [2]. An 
optimum property of the procedure is given in [19]. An extension of 
Table 1 in [1] is given as Table Al in [17]; this latter paper describes an 
alternative approach to the problem. 

Procedure B is described in detail in [3], and tables for applying it 
are given in [14] and [15]. Extensions of Table 3 in [14] are given as 
Tables la and 1b in [13] and as Table II in [18]. 

Procedure C is reported on in [8] and a modification of it is reported 
on in [7]. It will be described in detail in [9] and the theory underlying 
it will be given in [5]. 

Procedure D, the subject of the present paper, is reported on in [10]. 

The same general approach that was used in the above procedures 
applies equally well to selection problems involving other parameters 
and/or types of populations. Thus, [6] and [11] deal with the problem 
of selecting the normal population with the smallest population variance; 
[9] also considers this problem. The problem of selecting the binomial 
population with the smallest population probability of ‘‘success’’ is 
considered in [20] (as well as in [9]). The problem of selecting the 
multinomial event with the largest population probability is considered 
in [4] and [12] (as well as in [5]). The problem of selecting the ez- 
ponential population with the largest scale parameter is considered in 
[9] and in [21]; the latter emphasizes the life-test aspects of the problem. 
Several other selection problems are treated in [5] and [9]. The procedure 
described in [16] deals with selection problems involving only two popu- 
lations; it can be regarded as a special case of the procedure described 
in [9] since the latter deals with selection problems involving an arbitrary 
number of populations. 

No attempt has been made here to list papers describing multiple- 
decision procedures which use essentially different approaches from the 
one which is common to almost all of the papers referred to above. 


1.3 Organization of this paper 


The major ideas in the present paper are contained in three sections: 
Section 2 describes Procedure D and tells how it can be used with a 
completely randomized design; Section 3 tells how it can be used with 
experimental designs such as randomized blocks, cross-overs, and Latin 
squares; Section 4 gives a worked-out numerical example showing how 
the exact procedure and approximations to the exact procedure are 
applied with a completely randomized design and with other designs. 


4Square brackets refer to references listed at the end of this paper. 


4 
| 
‘ 
3 


A SEQUENTIAL MULTIPLE-DECISION PROCEDURE 411 
2, DESCRIPTION OF PROCEDURE D 


2.1 Statistical assumptions 


Throughout this paper the statistical assumptions that are made 
are the same as those made for Model I (all “effects” fixed) of the 
Analysis of Variance. Thus, the observations X,; are normally and 
independently distributed chance variables with unknown population 
means p; = + a, and a common unknown variance (i = 1, 2, , k; 
j = 1, 2, --- , ad inf.). In this simple model, the one associated with 
a completely randomized design, u represents the grand mean, and the 
a; , >,., a; = 0, represent the treatment “effects.” (In the models 
associated with more complicated designs, it is assumed that the popu- 
lation means can be expressed as the sum of a grand mean, treatment 
“effects,” and additional “effects.”) The ranked population means 
are denoted by 


Mri) S Sam - 


It is not known which population is associated with w,);; = uw + ayi; 
(¢ = 1,2,---,k). The differences between the largest and the remaining 
ranked population means are denoted by 


2.2 The experimenter’s goal, specification, and requirement 


GOAL: The experimenter’s goal is to select the population associated 
with Mik) 


The statistical formulation of the problem for this goal involves 
the true difference 6,,-, = 6 (say) and the true probability P of a 
correct. selection. It is assumed that before experimentation starts 
the experimenter can specify a pair of constants (6*, P*) with 
0 < 6* < » and1/k < P* < 1, as described below. 


SPECIFICATION: The experimenter specifies: 


a) The smallest value, 6*, of the difference 5 that is worth detect- 
ing, and 

b) The smallest acceptable value, P*, of the probability P of 
achieving the above goal when 6 2 6*. 


The specification above is summarized in the following: 


. REQUIREMENT: The experimenter requires that the procedure to 
- be used guarantee that 
[Probability (Correct selection | 6,,,-.2 6*) = P*), 


: 


412 BIOMETRICS, SEPTEMBER 1958 


that is, the probability of a correct selection is to be equal to or greater 
than P* whenever the true (but unknown) difference between the 
largest and second largest population mean is equal to or greater than 6*. 

The two-sample procedure [3] will guarantee this requirement when 
the Y,; satisfy the above statistical assumptions. We will describe 
here a sequential procedure [10] that will guarantee this requirement 
when the X,; satisfy these same statistical assumptions; as will be 
pointed out in Section 3.1, the sequential procedure is superior to the 
two-sample procedure in several important respects. 


2.3 Definition of symbols 


We denote the sample sum based on the first m observations from the 
i population by Y,,, (¢ = 1, 2, --- , k), and the ranked Y,,, by 


The procedure depends on the k(k — 1) signed differences 
Dine = — j;1,7 = 1,2,--- ,h), 


and $3, , the unbiased estimate of o” based on d,, degrees of freedom after 
m stages of experimentation. (In general, d,, = d,(1) + d,,(2) where 
d,(1) is the number of degrees of freedom associated with an estimate of 
o which may be available having been obtained from previous com- 
parable experiments, before experimentation starts, and d,,(2) is the 
number of degrees of freedom associated with an estimate of o” made 
from the present experiment after m stages of experimentation. Clearly, 
both d,(1) and d,,(2) depend on the experimental designs that were used, 
and in most instances the same design should be associated with both.) 

The differences D,;,;;, appear in k positive definite quadratic forms 


k k 
Q-istim = A — — mé*], 
a=1 B=1 
a pri 
each of order k — 1. Here Ags = 2(k — 1)/k fora = B and —2/k for 
a ~ 8. It can easily be shown that 
Qiim S Qine S S Quin - 


The quadratic forms Q).-;+1; and the estimate Sj, enter the procedure 
in k expressions 


M = {1 + (7 = k): 


it follows from the definition of the A/,,;,, and the fact that the Q),). 


‘ 
£ 
: 


A SEQUENTIAL MULTIPLE-DECISION PROCEDURE 413 


are ranked that 


The following sequential procedure (Procedure D) guarantees the 
requirement and terminatcs with probability unity. (A proof of this 
statement will be given in a forthcoming paper by M. Sobel and the 
present author. A derivation of the procedure will also be given in 
that paper.) 


2.4 Rule for Procedure D 


The procedure makes use of the constants 6* and P* which are 
specified before experimentation starts. The rule is: 

“At the m stage of experimentation (m = 1, 2, ---) take an observa- 
tion from each of the k populations. Starting with m = 2, compute 
the statistic 

Min 
= 

a) If Z,,(d,,) < (1 — P*)/P*, stop experimentation and choose the 
population which yielded the largest sum, Y x; , aS the one having the 
largest population mean. 

b) If Z,,(d,,) > (1 — P*)/P*, take another observation from each of 
the k populations and compute Z,,4;(dm41)- 

Continue in this manner until the rule calls for stopping.” 


2.5 Relation of Procedure D to Procedure C 


It is of considerable interest to examine the operation of the rule 
for Procedure D as d,, grows large. It can be verified that this rule 
could be stated equivalently as: 

“At the m* stage of experimentation (m = 1, 2, ---) take an observa- 
tion from each of the k populations. Starting with m = 2, compute 
the statistic 


k-1 
Z = exp { —[8* Dix. + Use + Vin)} 
where e is the base of the natural logarithm system. Here 


Uin = -))- {> >> Dien — mé* + Dien] 


a=2 p=1 a=l1 


+ (k—- / mist. 


4 3 
| 
3 
) 
J 
e 
: 
4 


414 BLOMETRICS, SEPTEMBER 1958 


and V,,, isa function of the = 1,2, = 1,2, m) and 
6*, which is of the order 1/d2 , and hence for each m it: approaches zero 
more rapidly than U’,,, as d,, grows large. 

a) If Z,.(d,.) < (1 — P*)/P*, stop experimentation and choose the 
population which yielded the largest sum, Yj,), , 28 the one having the 
largest population mean. 

b) If Z,,(d,.) > (1 — P*)/P*, take another observation from each 
of the k populations and compute Z,,4;(dn41)- 

Continue in this manner until the rule ealls for stopping.” 

It is well known that as d,, grows large the statistic Sj, approaches 
o, and hence the statistic Z,,(d,,) approaches the statistic 


k-1 
i=] 
Moreover, the above rule with d,, = © is precisely the rule associated 
with Procedure C. Thus, as d,, grows large, Procedure D becomes equiv- 
alent to Procedure C. 


2.6 Approximations to Procedure D 
When d,, is moderately large the rule could be based on the statistic 


k-1 
Znldn) = exp + 
and when d,, is very large (as judged by the magnitude .* the U;,, and 
the stability of S?,) the rule could be based on the sta istic 


~ k-1 
dn) = exp {— Dix, 
t=1 
Use of the above approximations makes the necessary computations 
much simpler with little loss of accuracy. 


3. USE OF PROCEDURE D WITH VARIOUS EXPERIMENTAL DESIGNS 


3.1 Desirable properties of Procedure D 


Procedure D has two important virtues which make it particularly 
attractive for use in experimental situations: 


a) It can capitalize on any decrease in the underlying variance of 
the experiment which the experimenter can effect. (And it is 
not necessary that the experimenter know even the magnitude 
of this variance.) 

b) It can capitalize on any favorable configuration of the population 


means. 


4 
( 
f 4 
( 
é 
t 
I 
n 
tl 
it 
al 
on 
th 


A SEQUENTIAL MULTIPLE-DECISION PROCEDURE ; 415 


We shall elaborate a bit on these two points. 

The experimenter usually does not know what the population vari- 
ance of his experiment would be if he were to use a particular experi- 
mental design—he only knows (or hopes) that the particular 
experimental design that he proposes to use will decrease this underlying 
variance (sufficiently to compensate for the loss of degrees of freedom). 
Procedure D can capitalize on any such decrease, and the experimenter 
reaps the benefit in a decrease in the average sample size required for the 
experiment to terminate. 

Also, the experimenter seldom (if ever) knows the true configuration 
of the population means. However, in order to meet his requirement 
he must operate as if the population means were in the least favorable 
configuration.* If Nature presents the experimenter with a more 
favorable configuration, Procedure D can capitalize on this generosity. 
And again the experimenter reaps the benefit in a decrease in the average 
sample size required for the experiment to terminate. As the underlying 
variance of the experiment approaches zero and/or as the configuration 
of the population means becomes more and more favorable, the average 
sample size (per population) approaches two! 

It is true that Procedure B can also capitalize on any decrease in 
the underlying variance, but it cannot do so to the same extent as 
Procedure D. For the experimenter must commit himself to a first 
sample (the size of which is usually greater than two per population); 
and then the best the experimenter can hope for is that he will not 
have to take a second sample. Furthermore, when the experimenter 
uses Procedure B he must operate (as he does with Procedure D) as if 
the population means were in the least favorable configuration. But if 
Nature presents the experimenter with a more favorable configuration, 
Procedure B cannot capitalize on this generosity. Hence, in these two 
very important respects, Procedure D is superior to Procedure B. 


3.2 Instructions for use of Procedure D with various experimental designs 


In order to describe how to use Procedure D with the various experi- 
mental designs it suffices to tell how to take the observations, and how 
to compute S}, and d,,. In the description given below, it is assumed 
that d,(1) = 0 and hence that d,, = d,,(2); if do(1) > 0, the modification 
in computing Sj, is obvious. Jn every instance Si.) is the residual 
mean square obtained from the ordinary Analysis of Variance after m 
observations have been taken from each population (and it is computed in 


‘That is, a configuration in which wy.) = 4j.-.) = #jx) — 5* Any configuration for which 
one or both of these “‘ =" signs is replaced by a ‘‘<” sign is more farorable to the experiments since 
the same requirement can be guaranteed with a smaller average sample size. 


‘ 
_| 
y 
of 
1S 
le 
| 
yn 


416 BIOMETRICS, SEPTEMBER 1958 


the usual way), while d,,(2) is the associated number of degrees of freedom. 

Some of the simple standard designs are listed below. In each case 
the a; represent the treatment “effects,’”’ and it is desired to select the 
population associated with a;,; . 


a) Completely randomized design: 


The X,; have population means 


da; = 0 


1 k m 


and d,,(2) = k(m — 1). 


b) Randomized blocks design: 
The X,; have population means 


uta, +8;, Da. = 28; =0 


1, 2, = 1, 2, 
Then 
1 k m 


f=1 j=1 


and d,,(2) = (k — 1)(m — 1). 


c) Cross-over design: 
The X,,;, have population means 
k m k 
ptatB+%, La= = 
t=1 j= p= 


(i, p = 1, 2, --- , k; j = 1, 2, ---) in a pattern of the type given 
(for k = 3, m = 6) below: 


ptatht+n 
th+n 
t Btn 


eta 
B+ as + Bs + 72 
wt + By Y2 
a, + Bs + ¥2 
B+ as + Bo + 


Bi + 
b+ a+ B+ 
74s 
t+ 


(2 1, 2, yk; j = 1, 2, vee), 
Then 
k k m 
i 
( 


A SEQUENTIAL MULTIPLE-DECISION PROCEDURE 417 


Then Si 2) is computed in the obvious manner and d,,(2) = 

(k — 1)(m — 2). 

The experimenter can compute Z,,(d,,) only when m divided by k 
is an integer (that is, at the k*, 2k, 3k, etc., stages) and the rule can 
be applied only for those values of m. 

d) Latin square design (replicated): 


The X,;, have population means 


utatBt+w, La = D8 = Dy =0 


(i = 1, 2, --- , k; 3, p = 1, 2, ---) in a pattern of the type given 
(for kK = 3, m = 6) below: 


eta 
Bs + 
Bt a + Bo + 


a, + B+ 
+ Bs + 
a2 + By + 
hta+6s+ 7s 


as + Bi +73 
a. + Bo + 43 
7s 
B+ a + Bs + 


uta, + Be +7. 


Then Si...) is computed in the obvious manner and d,,(2) = 

(k — 2)(m — 1). 

The experimenter can compute Z,,(d,,) only when m divided by k 
is an integer (that is, at the k*, 2k", 3k", etc., stages) and the rule can 
be applied only for those values of m. 


e) Balanced incomplete blocks design (replicated): 
Procedure D can be applied, but the method is not given here. 


4. NUMERICAL EXAMPLE ILLUSTRATING COMPUTATIONS 
ASSOCIATED WITH USE OF PROCEDURE D 


4.1 Completely randomized design 


Given three normal populations with a common unknown variance. 
It is desired to select the population with the largest population mean, 
and to guarantee that the probability of a correct selection is at least 
0.75 when 6,,, 2 4 units. It is assumed that a completely randomized 
design is to be used, and that d)(1) =: 0. What form does Procedure D 
take? 


| 
2 
k m m 
— p= 
= 
4 


418 BIOMETRICS, SEPTEMBER, 1958 


For k = 3, P* = 0.75, and 6* = 4 we have 


— 1) | m (= x.) } 


i=1 7=1 


= — [Dos.aym — — 4m] 

+ [Dis.2»m — 4m]’}, 
= — 4m) — — — 4m] 

+ — 4m)’}, 
Qisim = — 4m) — [Daan — 4m)[Da,s»m — 4m] 

+ [Da.s»m — 4m]’}; 


= [1 + — ?” 
—(3m-1)/2 
1+ — [4—i]m 
2 Xi; - x.)} 
(¢ = 1, 2, 3); 
[3] m 
and 


(1 — P*)/P* = 1/8. 

The rule for the exact form of Procedure D is: 

“At the m* stage of experimentation (m = 1, 2, ---) take an observa- 
tion from each of the 3 populations. Starting with m = 2, compute 
the statistic Z,,(3m — 3). 

a) If Z,,(3m — 3) S 3, stop experimentation and choose the popula- 
tion which yielded the largest sum, Y;;),, , as the one having the largest 
population mean. 

b) If Z,,(3m — 3) > 4, take another observation from each of the 
3 populations and compute Z,,.,(3m). 

Continue in this manner until the rule calls for stopping.” 

If the approximation form of Procedure D based on Z,(3m — 3) 
is to be used, we require 


1 2 2 
Ui. 3(m 1) [2 { Dis.2ym + Dis.1)m 


+ Me: 1)m — 4m(Di3,2)m + Dy ,2)m) + |, 


3 m m 2 
D3m-3 — 4G; 


A SEQUENTIAL MULTIPLE-DECISION PROCEDURE 


3(m 1) [2 { Dis.2)m + 


+ Disaym — — + 
and 
Z,(3m — 3) = exp + 
+ exp + Uom)}- 


The rule for this approximation form of Procedure D is then the same as 
the rule for the exact form of Procedure D with Z,,(3m — 3) replacing 
Z,,(3m — 3). 

If the approximation form of Procedure D based on Z,,(3m — 3) 
is to be used, we require 


Z,(3m — 3) = exp { + exp { 


The rule for this approximation form of Procedure D is then the same as 
the rule for the exact form of Procedure D with Z,,(3m — 3) replacing 
Z,(3m — 3). 

If o is known, and is equal to (say) 20 units, then Procedure C 
based on Z,,(@) is to be used, and we require 


Zn(@) = exp {—0.01 + exp {—0.01 Ds 


The rule for Procedure C is then the same as the rule for the exact form 
of Procedure D with Z,,() replacing Z,,(3m — 3), and the computation 
of Z,,(©) starting with m = 1. 

In order to illustrate the application of the exact and approximation 
forms of Procedure D, we have applied them to some artificial data. The 
X,; in Table 1 are random deviates from three normal populations with 
the following population means and standard deviations: 


Population 1: As; uy = 44 o, = 20 
Population 2: Xs; M. = 40 o, = 20 
Population 3: X3i bs = 40 a; = 20. 


Note that the population means are in the least favorable configuration. 
The data associated with 23 stages of experimentation are listed. 
Tables 1 and 2a give in detail the computations necessary to apply 
the exact form of Procedure D based on Z,,(8m — 3). It is to be noted 
that experimentation with this procedure would stop at the 17 stage 
since at this stage Z,,(3m — 3) is less than 4 for the first time; a correct 
selection would have been made. 


419 
ey, 
= 
2 


420 BIOMETRICS, SEPTEMBER. 1958 


TABLE 1 


CoMmPuTATIONS FOR PRocepuRE 


m| Xim Xom Xam Yim Yom Yam Dosaym Dery 


1 | 43.94 31.74 20.92] 43.94 31.74 20.92] 23.02 12.20 10.82 
2 | 75.56 22.66 69.08 | 119.50 54.40 90.00} 65.10 29.50 35.60 
3 | 20.62 43.46 39.40 | 140.12 97.86 129.40] 42.26 10.72 31.54 
4 | 31.22 31.14 43.48 | 171.34 129.00 172.88 | 43.88 1.54 42.34 
5 217 


56.50 35.62 44.78 | 227.84 164.62 .66 | 63.22 10.18 53.04 


6 | 34.78 45.20 40.36 | 262.62 209.82 258.02 | 52.80 4.60 48.20 
7 | 32.32 28.90 52.50 | 294.94 238.72 310.52] 71.80 15.58 56.22 
8 | 67.38 45.02 35.22 | 362.32 283.74 345.74 | 78.56 16.58 62.00 
9 | 37.86 33.14 50.56 | 400.18 316.88 396.30 | 83.30 3.88 79.42 
0 | 16.42 43.80 65.36 | 416.60 360.68 461.66 | 100.98 45.06 55.92 


11 | 66.94 81.60 51.92 | 483.54 442.28 513.58 | 71.30 30.04 41.26 
12 | 60.28 24.68 26.38 | 543.82 466.96 539.96 | 76.86 3.86 73.00 
13 | 59.56 20.72 42.94 | 603.38 487.68 582.90 | 115.70 20.48 95.22 
14 | 79.58 28.04 30.16 | 682.96 515.72 613.06 | 167.24 69.90 97.34 
15 | 22.76 53.24 51.38 | 705.72 568.96 664.44 | 136.76 41.28 95.48 


16 | 84.80 18.36 28.12 | 790.52 587.32 692.56 | 203.20 97.96 105.24 
: 20.58 | 840.14 653.24 713.14 | 186.9) 127.00 59.90 

18 | 46.22 53.52 33.20 | 886.36 706.76 746.34 | 179.60 140.02 39.58 
68.04 | 930.50 753.38 814.38 | 177.12 116.12 61.00 

42 | 968.26 805.92 835.80 | 162.34 132.46 29.88 


21 | 38.90 69.64 49.66 {1007.16 875.56 885.46 | 131.60 121.70 9.90 
22 | 76.90 79.66 11.02 |1084.06 955.22 896.48 | 187.58 128.84 58.74 
23 | 74.84 9.92 51.06 |1158.90 965.14 947.54 | 211.36 193.76 17.60 


Table 3 gives U,,, and U;2,, which are necessary to apply the approxi- 
mation form of Procedure D based on Z,,(3m — 3). Experimentation 
with this form of the procedure would also stop at the 17" stage, and a 
correct decision would have been made. It is to be noted that although 
U,,, and U;,, do not approach zero very rapidly, the statistic Z,,(d,,) does 
approach the statistic Z,,(d,,) very rapidly. Hence, for moderate values 
of d,, the approximation would appear to be a good one. 

Table 4 compares Z,,(d»,), Zm(dm), Zm(dm), and Z,,() as a function of 
m when d,, = 3m — 3. Experimentation with the approximation form of 
Procedure D based on Z,,(d,,) would also stop at the 17" stage, but it is 
to be noted that Z,,(d,,) and Z,,(d,,) differ by slightly more than 0.01 at 
that stage. In general, it would appear that it might not be advisable 


TABLE 2a 


— 
| 
A 
th 


= LZ19Z02°0 TZO6FT'O 86220 °0  2699°FLE‘SS FST TEE €9 
O18z'0 69T8EL°0 FZ8E80 66‘9 699 2g 0Z 
602Z¢90¢ LS6Z0T 692060 0082'F26‘F9 189‘8h ZEIT ‘FT TEE 8P pA 
FOTEEE'O T8660 0  SZIO'SFI‘9F COE cr 9T 
CFOL'O SZEFES 261080 °0 G6OTS 9FT‘OS  GOTE 6EZ‘OI FIZ 18% oP cT 
2692 T 200T¢9 £0EF19'O 0 19Z8 906 926 €€ (ai 
E9F9C8 920962 0 190¢ #80'2 T90L°T9L‘T 608 F2% 0€ IT 
2999S EZ61FSZ'O #82680 °0 6909°2E2'0%Z  690F TSE or 
0L892E'0 96TLZFE'O TZ6120°0 6LEST F8E'9 68T 6 
= 3£96' 0 €ZZ61¢ €6¢22E'0 C888'CZZ‘FI $888" C8t9 LOT 1% 8 
69TE'T FFC8Z9 8F00SZ L F6L‘S e869 €8L° SES cI 9 
| 
5 Z080ZS 6F69 ZEZ‘S 6F6F  666‘S 6460 12S 
TI9¢°T 8F9229 0 T6600¢ F861 #822 68L‘T 9 € 
Fs 6£969E 6Z8°6% ‘0 F6STES'O 0088‘ 60F‘¢ 0089 0089 €F6°99¢ € 
(“p)“Z wlel py wlel yy “Pg | “p | 
< NDISAQ GAZINOGNVY AIALATAWOD V AANAANOU AOA SNOILVLAANOD 


i 
ds 
q 
| 


BIOMETRICS, SEPTEMBER 1958 


PZ ATAVL 

0826°0 086622 968ECT | €21290'°0 F861 ‘ES 682 ‘T F8IL LESS 96 € 

NDISA([ UAAO-SSOUD V WHOA 
% ATAVL 

I 69091 0 9EZ681'O 6669  666'¢ 092 8 ¢ 


AIZINOGNVY V YOU (| AUNAGOUY SNOILVLAAWOD 


422 

| 

| 
| 


| 0.076201 0.301080 } 0.354878 } 1.0661 


21.4933 


| 7,4 


2,794 .2933 


| 


2,352 .6933 


8334 


113 


A SEQUENTIAL MULTIPLE-DECISION PROCEDURE 423 


TABLE 3 
Factors FoR Computine Z,(dm) or Procepture D ror A COMPLETELY 
RANDOMIZED DesiGN 


m Uim Ven | Zm(dm) 
2 0.024585 0. 108309 1.4186 
3 0.202630 0.245854 1.5544 
4 0.090913 0. 145538 1.5533 
5 —0.008564 0.056500 1.2510 
6 0.016699 0.071915 1.3164 
7 —0.033357 0.027209 0.9935 
8 —0.033986 0.021768 0.9647 
9 —0.067855 0.001967 1.1156 
10 —0.036323 0.002690 0.5869 
il 0.022279 0.042334 0.9789 
12 0.001888 0.034073 1.2697 
13 —0.030638 0.009165 0.9177 
14 —0.058501 —0.022694 0.4798 
15 —0 027595 0.004741 0.7052 
16 —0.062544 —0.031932 0.3716 
17 —0.035688 —0.019757 0.3042 
18 —0.029083 —0.018760 0.2591 
19 023987 —0.008815 0.3112 
20 —0.013918 —0 006802 0.2815 
21 0.000313 0.002543 0.3612 
22 —0.015137 —0.003875 0.3196 
23 —0.022933 —0.019826 0.1996 


to use this approximation form of Procedure D when d,, is this small. 

If observations are cheap relative to computing costs, the following 
procedure appears to be a reasonable one: Use Z,,(d,,) until the rule calls 
for stopping; from that point on use Z,,(d,) until the rule calls for 
stopping; finally use Z,,(d,,) until the rule calls for stopping. This 
use of the approximations will cut down on the quantity of complicated 
computations at the expense of a small increase in the average sample 
size. (It is interesting to note that if the above rule had been applied 
to the particular numerical example which was considered, it would have 
been necessary to compute Z,,(d,,) and Z,,(d,,) only once—namely for 
m = 17.) Also, a substantial decrease in the volume of computations 
can be effected by computing only those rows of Table 2a which are 


Ss 
a 
| 
| 
| 
| 


424 BIOMETRICS, SEPTEMBER 1958 


TABLE 4 


CoMPARISON OF VALUES OF DECISION STATISTICS FOR PROCEDURE D FOR A 
CoMPLETELY RANDOMIZED DESIGN 


m Zm(dm) Zm(dm) Zm(dm) Zm( ©) 
1 — 1.6795 
2 1.4439 1.4186 1.4269 1.2661 
3 1.6221 1.5544 1.5611 1.5537 
4 1.5830 1.5533 1.5563 1.6295 
5 1.2551 1.2510 1.2508 1.4346 
6 1.3277 1.3164 1.3169 1.5448 
7 0.9878 0.9935 0.9919 1.3435 
8 0.9579 0.9647 0.9632 1.3030 
9 1.0938 1.1156 1.1121 1.3967 
10 0.5772 0.5869 0.5854 1.0015 
11 0.9989 0.9789 0.9794 1.2307 
12 1.2722 1.2697 1.2697 1.4258 
13 0.9101 0.9177 0.9165 1.1292 
14 0.4576 0.4798 0.4774 0.6849 
15 0.6988 0.7052 0.7045 0.9165 
16 0.3474 0.3716 0.3692 0.5065 
17 0.2896 0.3042 0.3030 0.4351 
18 0.2470 0.2591 0.2581 0.4125 
19 0.3029 0.3112 0.3106 0.4832 
20 0.2760 0.2815 0.2810 0.4631 
21 0.3621 0.3612 0.3610 0.5643 
22 0.3147 0.3196 0.3192 0.4289 
23 0.1899 0.1996 0.1990 0.2649 


associated with (say) even values of m; if this is done, the requirement 
still will be satisfied, but the average sample size will increase slightly. 

It is interesting to note that Procedure C, assuming a known standard 
deviation of 20, would stop at the 23rd stage which is later than the 
stage at which the exact form of Procedure D stopped. This phenome- 
non, however, is not typical, and on the average, Procedure C will stop 
before the exact form of Procedure D. (In the particular experiment 
under consideration, o” was sizably, although not significantly, under- 
estimated at the termination of experimentation.) Monte Carlo 
sampling with Procedure C under these same conditions yielded an 
average sample size (per population) of 41.3.° 


SAverage sample sizes and related data associated with Procedure C are contained in [9]. 


» 
F I 
I 
f 


A SEQUENTIAL MULTIPLE-DECISION PROCEDURE 


4.2 Other experimental designs 


In order to indicate to the reader how the computations would have 
been modified if a more complicated design than the completely ran- 
domized one had been used, three additional sample tables have been 
prepared. These are given as Tables 3b, 3c, and 3d and are associated 
with the randomized blocks design, the cross-over design, and the Latin 
square design (replicated), respectively. The computations in each of 
these tables were made as if the X;; (¢ = 1, 2, 3) listed in Table 1 were 
obtained from an experiment which used the associated design. The 
computations were cut off at m = 6 since it was felt that this number of 
rows sufficed to illustrate the method. The reader is reminded that for 
all designs, the treatment “effects” a, , a2 , and a; are associated with 
the columns headed X,,, , X2m , and X;,, , respectively, in Table 1; this 
is so because of the way that the observations were generated. 

It should be pointed out that for any given experimental design, the 
number of degrees of freedom, d,,(2), available for estimating o” builds up 
very rapidly even for moderately small k. Thus, for example, at the 24" 
stage of experimentation d.,(2) has the following values: 


Number of populations (k) 


Ixperimental design 


3 4 6 8 12 
Completely randomized 69 92 138 184 276 
Randomized blocks 46 69 115 161 253 
Cross-over 44 66 110 154 242 
Latin square, replicated 23 46 92 138 230 


Hence, the greater the number of populations, the more rapidly Pro- 
cedure D becomes equivalent to Procedure C. 


5. CONCLUDING REMARKS 


In the interest of keeping the exposition as simple as possible, only 
one goal, that of selecting the population with the largest population 
mean, was considered. However, more complicated goals such as 
selecting the ((1 < ¢ S k — 1) populations with the largest population 
means, or selecting and ordering the ¢ populations with the largest 
population means also can be treated within the same general theoretical 
framework. 

In addition, all of the foregoing has been concerned with single-factor 
experiments. However, Procedure D can be generalized to deal with 


, 
_ 
) 
O 
n 
Ka 


426 BIOMETRICS, SEPTEMBER 1958 


multi-factor experiments. For example, in two-factor experiments 
for which it can be assumed a priori that there is no interaction, the 
generalized procedure can be used to select simultaneously the “level” 
of the first factor and the “level” of the second factor associated with 
the largest population mean “effect” for each factor. And such a 
factorial experiment is more efficient (in terms of total average sample 
size) than two single-factor experiments. Of course, two-factor experi- 
ments for which it can be assumed a priori that there is a sizable inter- 
action can be regarded as single-factor experiments, the goal being to 
select that particular combination of first and second factor “levels” 
which is associated with the largest population mean “effect.” If the 
problem is looked at in this way, the present form of Procedure D 
can be used. 

The author conjectures that Procedure D is relatively insensitive to 
lack of normality of the X,;; . For the procedure depends on the X;; 
only through the Y,,, and S3_. And the distribution of the Y,,, approaches 
normality as m grows large while Si, is an unbiased estimate of the 
common o° regardless of the distributions of the X,;; . However, the 
procedure can be expected to be sensitive to violations of the assumption 
of a constant common variance. And, in fact, it will break down com- 
pletely if the violation is a severe one. With respect to the above 
violations, the probability of a correct selection for Procedure D reacts 
in the same way as does the power of an Analysis of Variance test of 
homogeneity. And any set of conditions which will permit the experi- 
menter to make a firm statement about the latter will also permit him 
to make a firm statement about the former. 

At the present time little is known about the average sample sizes 
associated with Procedure C. It appears likely that they will be only 
slightly larger than those associated with Procedure D (especially when 
the average sample sizes are not too small). Research along these lines 
is being continued. 

The reader may perhaps be overwhelmed by the volume of the com- 
putations associated with the exact form of Procedure D, especially 
when k is large. This labor is in a sense the penalty that he must pay 
for failure to know the numerical value of the common variance. (Pro- 
cedure C which assumes a knowledge of this common variance is much 
simpler to apply.) If observations are very expensive and must be 
held to an absolute minimum, the author sees little hope for relief— 
except perhaps by programming the procedure on a highspeed electronic 
computing machine. 

The ultimate test of the worth of the ideas set forth in this paper, 
and of Procedure D, in particular, must come when they are applied in 


i 
| 
\ 


wv 


A SEQUENTIAL MULTIPLE-DECISION PROCEDURE 427 


the field. The author anticipates certain initial difficulties. in accepting 
the formulation proposed, in specifying 6* and 7?*, in finding situations 
for which sequential experimentation (of the type described in this paper) 
is appropriate, and in applying Procedure D. All of these initial diffi- 
culties notwithstanding, the author believes that there are many 
practical problems for which Procedure D can and should be used, and 
he would welcome the comments of experimenters who report on their 
experience with it. 


6. SUMMARY 


A sequential multiple-decision procedure for selecting the best one 
of several normal populations with a common unknown variance is 
described. The procedure is a generalization of one which was previously 
proposed to handle the same problem when the common variance is 
known. Both procedures terminate with probability unity and guarantee 
that the probability of a correct selection is at least equal to some 
specified probability whenever the largest population mean is greater 
than the second-largest by some specified amount. For any given con- 
figuration of the population means, the average sample size required 
for termination of experimentation decreases as the population variance 
underlying the experiment decreases. Hence, the use of an experimental 
design which is effective in cutting down on this underlying variance 
will result in a decrease in the average sample size required for termina- 
tion. The average sample size also decreases as the configuration of 
the population means becomes more favorable. 

The method of applying the procedure with various experimental 
designs is described. Approximate methods involving substantially 
less computation than the exact method are also given. A worked-out 
numerical example is provided. 


7. ACKNOWLEDGMENTS 


The author is extremely grateful to Messrs. Norman Morse and 
Salah Elmaghraby who computed the entries in Tables 1 to4. Thanks 
are also due to Dr. Milton Sobel of the Bell Telephone Laboratories who 


who read this paper with painstaking care and made many helpful 
suggestions. 


REFERENCES 


[1] Bechhofer, R. E. A single-sample multiple-decision procedure for ranking 
means of normal populations with known variances. Ann. Math. Stat. 26 
(1954], pp. 16-39. 


st 
fies 
| 
Ths 
y 
2) 
- 
h 
ic 
in 


428 BIOMETRICS, SEPTEMBER 1958 


[2] Bechhofer, R. E. Multiple-decision procedures for ranking means. Proc. 
Ninth Ann. Conv. Amer. Soc. Qual. Cont., May 1955, pp. 513-519. 

[3] Bechhofer, R. E., Dunnett, C. W., and Sobel, M. A two-sample multiple- 
decision procedure for ranking means of normal populations with a common 
unknown variance, Biometrika 41 [1954], pp. 170-176. 

[4] Bechhofer, R. E., Elmaghraby, 8S. A., and Morse, N. A single-sample multiple- 
decision procedure for selecting the multinomial event which has the largest 
probability. Accepted for publication in the Ann. Math, Stat. 

[5] Bechhofer, R. E., Kiefer, J., and Sobel, M. A sequential multiple-decision 
procedure for certain identification and ranking problems. In preparation. 
To be submitted for publication as Part I of the 2-part monograph referred 
to under reference [9]. 

[6] Bechhofer, R. E. and Sobel, M. A single-sample multiple-decision procedure 
for ranking variances of normal populations. Ann. Math. Stat. 25 [1954], 
pp. 273-289. 

[7] Bechhofer, R. E. and Sobel, M. A sequential multiple-decision procedure 
for ranking means of normal populations with known variances (preliminary 
report). Abstract, Ann. Math. Stat. 24 [1953], pp. 136-137. 

[8] Bechhofer, R. E. and Sobel, M. On a sequential ranking procedure (preliminary 
report). Abstract, Bull. Amer. Math. Soc. 60 [1954], pp. 34-35. 

[9] Bechhofer, R. E. and Sobel, M. A sequential multiple-decision procedure 
for ranking parameters of Koopman-Darmois populations, with special 
reference to ranking means of normal populations with a common known 
variance. Submitted for publication as Part II of a 2-part monograph in 
the Siatistical Research Monographs series cosponsored by the Institute of 
Mathematical Statistics and the University of Chicago. Reference [5] will 
be submitted as Part I. 

[10] Bechhofer, R. E. and Sobel, M. A sequential multiple-decision procedure 
for selecting the population with the largest mean from k normal populations 
with a common unknown variance (preliminary report). Abstract, Ann. 
Math. Stat. 27 [1956], pp. 218-219. 

{11] Bechhofer, R. E. and Sobel, M. A scale invariant sequential multiple-decision 
procedure for selecting the population with the smallest variance from k 
normal populations (preliminary report). Abstract, Ann. Math. Stat. 27 
(1956], p. 219. 

[12] Bechhofer, R. E. and Sobel, M. A sequential multiple-decision procedure 
for selecting the multinomial event with the largest probability (preliminary 
report). Abstract, Ann. Math. Stat. 27 [1956], p. 861. 

{13] Dunnett, C. W. A multiple comparison procedure for comparing several 
treatments with a control. J. Amer. Stat. Assoc. 50 [1955], pp. 1096-1121. 

[14] Dunnett, C. W. and Sobel, M. A bivariate generalization of Student’s ¢-distri- 
bution with tables for certain special cases. Biometrika 41 [1954], pp. 153-169. 

[15] Dunnett, C. W. and Sobel, M. Approximations to the probability integral and 
certain percentage points of a multivariate analogue of Student’s ¢-distribution. 
Biometrika 42 [1955], pp. 258-260. 

[16] Girshick, M. A. Contributions to the theory of sequential analysis, I. Ann. 
Math. Stat. 17 [1946], pp. 123-143. 

[17] Gupta, S. S. On a decision rule for a problem in ranking means. Institute of 
Statistics Mimeograph Series No. 150, University of North Carolina, May 1956. 

[18] Gupta, S. S. and Sobel, M. On a statistic which arises in selection and ranking 
problems. Submitted for publication in Ann. Math. Stat. 


a 
A 
4 


A SEQUENTIAL MULTIPLE-DECISION PROCEDURE 429 


{19] Hall, W. J. An optimum property of Bechhoter’s single-sample multiple- 
decision procedure for ranking means and some extensions. Jnstitute of 
of Statistics Mimeograph Series No. 118, University of North Carolina, 
September 1954. 

([20] Iluyett, M. J. and Sobel, M. Sclecting the best one of several binomial popula- 
tions. The Bell System Tech. J. 36 [1957], pp. 537-576. 

21] Sobel, M. Statistical techniques for reducing the experiment time in reliability 
studies. Z'he Bell System Tech. J. 35 [1956], pp. 179 202. 


ion 
tie 
| 
y 
l 
i- 
. 
d 
d. 
of 
ig 
iz 


QUERIES AND NOTES 


George W. Snedecor, Editor 


131 NOTE: ON VARYING ONE FACTOR AT A TIME 


CuTHBERT DANIEL 
New York City 


The scientific worker who has varied each of three factors [suppose 
that he has done the runs (1), a, b, and c] may be told by the statistician 
that he has used an inefficient plan since by varying two factors at a 
time [doing (1), ab, ac, and bc] he could have doubled his precision of 
estimation of each of the three effects A, B, and C. But three of the 
four runs he has made are in the half-replicate with defining contrast 
I = +ABC. By carrying out the run abc he can still reach the doubled 
precision. The run (/) can be used in the contrast [a + b + ¢ — abe — 
2(1)]/4 to get an estimate of -AB — AC — BC + ABC. 

This type of augmentation appears more advantageous if four 
or five factors are included in the first set of one-at-a-time runs. Thus 
if (1), a, b, c, and d have been done, they can be augmented by abc, 
abd, acd, and bed to complete the 2*’ with defining contrast 
I = —ABCD. The experimenter has then, by adding four runs, both 
quadrupled his precision of estimation of the four main effects, and 
freed these estimates of two-factor interaction confounding. He has 
in effect made up for the time lost earlier in varying one factor at a time 
by later varying three at a time. 

For five factors, the augmentation of the first six single-factor runs 
to the 2°-* with defining contrast J = +ABCDE requires the addition 
of eleven runs (ten for the three-letter combinations, one for the five- 
letter run). By roughly tripling the amount of work done, all five effects 
are estimated with eight-fold precision and all two-factor interactions 
are separately estimable. 

Since one-factor-at-a-time sets can only be augmented to half- 
replicates, it does not seem likely that more than five factors will be so 
used. 

These step-wise operations violate the requirement of randomi- 
zation in some degree, but randomization of the later part of the plan 
should be practicable. 


430 


4 
4 
5 


QUERIES AND NOTES 431 


Fractional factorial experiments must be completed before any 
estimates of effects can be made. When runs can (or must) be carried 
out in sequence, conclusions can be drawn sooner from one-factor-at-a- 
time experiments. Even though such conclusions are of limited validity 
and of minimal precision, it may be worthwhile to take an early look 
in order to decide whether 


a. some factor has been varied over too wide a range. If a very 
large effect appears, the experimenter may wish to take a smaller 
step, or he may wish to move even further in a favorable direction. 

b. more runs must be made to attain better precision in estimating 
the effects of some of the factors. The augmentations given 
above will then secure maximum gains in precision and validity. 


Since each of the half-replicates described is included in its successors, 


new factors can be added or old ones deleted before or during augmen- 
tation. 


j 
a 
a 
= 
25 
4 
) 
1 


432 BIOMETRICS, SEPTEMBER 1958 


132 NOTE: ERRATA AND EXTENSIONS FOR “THE 
DISTRIBUTION OF THE EUROPEAN CORN 
BORER LARVAE PYRAUSTA NUBILALIS 
(HBN.), IN FIELD CORN’* 


Jupson U. McGurre, Tom A. Brinp ey, T. A. BANcRoFT 
Iowa State College and the U. S. D. A. Agricultural Research Service 


ERRATA 


The authors of the paper “The Distribution of the European Corn 
Borer Larvae Pyrausta Nubilalis (HBN.), in Field Corn,” Biometrics, 
Vol. 13, No. 1, March 1957, are grateful to readers who have called 
their attention to one misprint in the text and certain numerical in- 
accuracies and oversights in Table 1, page 72, and the Appendix, 
pages 74-78. Since it appeared likely that others (Sprott [1958]) 
interested in fitting compound distributions would wish to make use 
of the data and results, a recalculation of all numerical work was under- 
taken. The resulting corrections are presented below: 


1. In the text. 
(i) p. 69 line 17 should read p < 1. 


2. In the appendix. 


(i) p. 74. Distribution 1. The Observed frequency column 
should contain no bracket, that is, all eleven frequency 
classes were used. The Theoretical frequency of the 2 
borers class for NB should be 743.06 instead of 734.06. 
The corresponding chi-square values are correct as given 
in Table 1. 

(ii) p. 75. Distribution 2. The Observed frequency column 
should combine classes 8, 9, 10, so there were altogether 9 
classes used. The Theoretical frequency of the 0 borer class 
for NB should be 553.85 instead of 553.18, and for the 10 
borer class for NTA should be 1.73 instead of 1.92. The 
corrected chi-square for NTA is 9.92. 

(iii) p. 76. Distribution 3. The heading in the table should be 
N = 324 instead of N = 311. 

(iv) p. 77. Distribution 5. The Observed frequency column 
should show three brackets instead of one. Frequencies 10 
and 11, 12 and 13, and 14 through 25 were grouped separately. 


*Journal Paper No. 3454 of the Iowa Agricultural Experiment Station, Ames, Iowa. Project 
No. 1193. 


= 
: 
“ALS 
| 
i 


QUERIES AND NOTES 433 


The Theoretical frequency of the 12 borer class for NTA 
should be 5.36 instead of 5.06 and the corresponding chi- 
square 16.33 instead of 16.17. 

(v) p. 77. Distribution 6. The Theoretical frequency of the 
4 borer class should be 4.42 instead of 4.31 for PB and the 
corresponding chi-square is 0.47 instead of 0.61. 

(vi) p. 78. Distribution 9. The Observed frequency column 
should combine frequencies at 4 and 5, so there were al- 
together 5 classes used. The Theoretical frequency of the 0 
borer class for PB should read 187.02 instead of 197.02. 
The chi-square values are unchanged. 


3. In Table 1, p. 72. 


(i) The entries in the Degrees of freedom column should all be 
reduced by 1. 

(ii) The probabilities (p) for the corresponding chi-square values 
were calculated more precisely as follows: NB (< 0.0001, 
0.017, 0.808, 0.950, 0.145, 0.162, 0.518, 0.243, 0.305), 
NTA (< 0.001, 0.135, 0.845, 0.617, 0.091, 0.261, 0.281, 
0.530, 0.530), PB (< 0.005, 0.313, 0.862, < 0.005, 0.062, 
0.494, < 0.001, 0.830, 0.588). 


Fortunately the above corrections lead to no change in conclusions 
under the assumptions made in the paper. 


TENSIONS 


Quoting from page 71, first paragraph of the paper, “In most cases 
the difference between the various fitted distributions is slight, and in 
some cases, for practical purposes, no distinction would be made between 
two or even all three of them. Had more efficient methods of estimation, 
consistent with the selected tests of goodness-of-fit, been available, the 
Poisson binomial distribution might perhaps have been fitted more 
closely, but even using the above estimating procedures, this model 
gave on the whole closer approximations to the data than did the nega- 
tive binomial, as measured by an ordinary chi-square goodness-of-fit 
test.” 

Now, in order to be completely consistent, it would seem logically 
desirable to use the same method of fitting for the negative binomial, 
the Neyman Type A, and the Poisson binomial distributions followed 
by a common consistent test of goodness-of-fit. That is, should the 
common method of fitting be the method of maximum likelihood, then 


7 
i 
if 
| 
) 


434 BIOMETRICS, SEPTEMBER 1958 


it would seem appropriate to follow with a common likelihood ratio 
test of goodness-of-fit. On the other hand, it would seem desirable to 
follow a minimum chi-square fitting procedure with a chi-square good- 
ness-of-fit test. Presumably the choice of a particular common fitting 
procedure and a common consistent test of goodness-of-fit would 
depend upon the relative merits of the properties of both the estimating 
procedure and the testing procedure. Further, for routine fitting of 
distributions, one is always concerned with the availability of pro- 
cedures and, particularly, ones that do not involve tedious calculations. 

While no attempt will be made here to follow the suggestions outlined 
in the above paragraph, it is proposed to use the results obtained 
recently by Sprott [1958] to compare the ordinary chi-square goodness- 
of-fit values for distributions 4, 6, and 7 when the negative binomial and 
Poisson binomial are both fitted by the method of maximum likelihood. 
For the Poisson binomial the previously obtained chi-square values for 
fitting by moments is also given for comparison. 


TABLE I’ 
NEGATIVE BINOMIAL AND Potsson BinoMIAL Fittep To DistriBuTIONsS 4, 6, AND 7 
Chi-square Goodness-of-fit 
Observed Degrees of 
distribution freedom Negative binomial Poisson binomial 
M. L. Moments M. L. 
4 0.71 17.47 5.71 
6 1 1.94 0.47 0.55 
7 4 3.26 18.90 9.88 


It will be noted in Table 1’ that fitting by maximum likelihood 
results in general in a closer fit for the Poisson binomial than by the 
method of moments, which bears out the conjecture of the quotation 
in paragraph one of this section. However, the chi-square values for 
the Poisson binomial for distributions 4 and 7, when fitted by maximum 
likelihood, are still appreciably larger than those for the negative 
binomial. Also for distribution 6, the chi-square value for the negative 
binomial is still appreciably larger than that for the Poisson binomial 
when both are fitted by maximum likelihood. 


REFERENCE 


[1] Sprott, D. A. The method of maximum likehood applied to the Poisson binomial 
distribution. Biometrics 14: 97-102. March 1958. 


FR 


al 


ABSTRACTS 


Papers read at the 5th Biometric Colloquy at Bad Nauheim, Germany, 
January 24 — 26, 1958. 


R. K. BAUER (Krefeld): On Non-parametric Significance Tests 
487 of Multiple and Partial Correlation Coefficients and Multiple 
Regression Coefficients. 


There are two ways of testing significance if in correlation analysis 
the normal distribution assumptions are not fulfilled: (1) Testing by 
normal distribution methods with an arbitrary safety adjustment. 
(2) Non-parametric testing by means of the BIENAYME-CHEBY- 
SHEV inequality. The latter method involves the non-parametric 
computation of variances under the null hypothesis, which were obtained 
as 


for the multiple correlation coefficient, 


for the partial correlation coefficient, 
~ _Var(z) 
var(z) 


for the multiple regression coefficient. 


G. FREYTAG (Zwickau): Factor Analysis of Electrencephalo- 
graphical and Psychological Test Data. 


Study of the relations between the e.e.g.-characteristics: alpha- 
frequency, potential, and index and psychological features. Correlations 
between e.e.g.-characteristics and psychological test results were 
significant only to a small extent. A THURSTONE factor analysis 
yielded 3 significant factors: (1) factor of flexibility, plasticity, (2) factor 


488 


435 


¥ “4 
a 


436 BIOMETRICS, SEPTEMBER 1958 


of complexity, stabilization, (3) factor of vital energy. It could be 
demonstrated that the psychological tests used were codetermined only 
to a smaller extent by the 3 extracted factors. 


E.GOLTNER (Mainz) : Correlations in Iron Metabolism of Mother, 


ad Placenta and Fetus. 


By partial correlation analysis certain relations between maternal 
serum iron content, ferritine and hemosiderine content of placenta 
and of liver, spleen and kidney of stillborn fetuses are evaluated. 


H. GRIMM (Jena): Notes on the Application of the Chi-square 


Test. 


Review on the applicability of the chi-square goodness-of-fit con- 
tingency, and dispersion tests when expectations are small. Poisson- 
probability-paper facilitates decision as to alternative distributions. 


U. HACKENBERG (Brackwede) : Measure and Computation with 
491 a Special System of Numbers (WL-system) in the Pharmacological 
Laboratory. 


Description and illustration of an arithmetical calculus based on 
pharmacological and biomathematical principles (WEBER-FECHNER 
law, biological variation), consisting of fixed decimalized geometric 
series. A group of 48 numbers from these series, and their products 
with integer powers of 10, suffice for most practical purposes. The 
restriction to 24 or 48 numbers per decade facilitates smooth computa- 
tional procedures, documentation, and pharmacological decisions. Use 
of logarithms to the base geometric progression factor further reduces 
arithmetical labor. 


P. IHM (Freiburg/Br.): Statistical Analyses using Electronic 


Computers. 


Utilization of electronic computers by biological and medical 
statisticians is advocated. The time-consuming programming can 
be facilitated by constructing appropriate computation schemes. 
To avoid errors the programming itself should be left to the machine.— 
For the case of the analysis of variance a formalized language could be 
found by means of which the mathematical formulas could be translated 
into a form readily digestible for the machine, the latter then setting up 
its own complicated computational programme for performing the 
analysis of variance. 


ar 
“4 
fe 
J 
0 
y 
t 


ABSTRACTS 437 


493 P. KUHNE (Berlin-Charlottenburg): A Method for the Estimation 
of the Clinical Significance of Differences. 


The usual methods of estimating the significance of differences 
in many problems of clinical research show the paradoxical disadvantage 
of giving the less relevant results the larger the number of cases is. 
This occurs in spite of an actually significant result obtained. This 
empirical impression is explained by the fact that clinicians in many of 
their problems—contrary to pharmacologists and to pathophysiological 
research workers—are not interested in a systematic difference or in 
a systematic deviation of the mean values but that they want to obtain 
an estimate within certain confidence limits of the prospective reaction 

f their future patients. With increasing numbers now ever smaller 
ystematical differences are becoming measurable. These differences, 
however, tend to mean less and less in regard to the predictability of 
the reaction of individual patients. For overcoming this difficulty in 
the statistical evaluation of clinical and therapeutic problems on the 
basis of parametic methods for the difference « — 0 (a statistic that 
therapeutic changes generally can be reduced to) confidence limits 
have been calculated for empirical ratio #/s with a view to obtaining 
an estimate of the prospective reaction of future cases: 


ec = confidence limits chosen 
distance » — 0 in terms of normal distribution chosen for range of 
prediction. 


n 


Thus the question can be tested if any arbitrarily chosen proportion 
of cases at least can be expected to follow qualitatively the direction 
of the deviation of the mean indicated by the systematic result. The 
respective values have been tabulated for 37 df’s, the predictive ranges 
of 2/3, 19/20 and 99/100 and the limits: o.,; and po; as well as pio - 


494 G. A. LIENERT (Marburg): On the Question of Testing Differ- 
ences in Variances when the Variates Are Paired. 


In pharmacopsychological, clinical and other studies the observation 
of a change in variance raises the question of how to test this change 
when the variates are paired. The variance-ratio F-test is presumably 
insensitive due to loss of information supplied by knowledge of correla- 
tion between the variates and is furthermore inapplicable if the variates 
are distributed non-normally.—The necessity of an appropriate test 


ec 
oe 
= 
4 A — 1 
id 
n 
he 
fiw 


438 BIOMETRICS, SEPTEMBER 1958 


in the non-parametric case is demonstrated along with some suggestions 
for solution of the problem. 


495 Ch. MULLER (Berlin): Correlational Studies on Human Radii. 


Analysis of correlation between radius length and caput radii diameter 
of 533 archaic radii makes possible estimation of radius length and, in 
addition, body height from caput radii diameter. 


Jersey NEYMAN (Univ. of Calif., Berkeley, Calif., U.S.A.): 
496 The Problem of Distinguishing between Selection Effect and 
the Cumulative Effect of a Prophylactic Treatment. 


Consider a population of individuals (perhaps the population of 
children of a specified age, sex, etc.) subject to a certain risk (perhaps 
to the risk of contracting polymyelitis within a given time). At a 
particular moment a sample of N individuals is drawn out of the popula- 
tion, each individual is subjected to prophylactic treatment and is kept 
under observation for a specified number s of units of time. Specifically, 
the prophylactic treatment (perhaps an injection of a vaccine) is 
applied at the beginning of each unit of observation to all the individuals 
who up to that time did not succumb to the risk contemplated. The 
observations available are represented by the numbers n; of those 
individuals who escape the risk over exactly 7 full units of observation, 
fori = 0,1, --- , s. With this notation, the difference n; — n;,, repre- 
sents the number of individuals who succumb to the risk within the 
(¢ + 1) unit of time and the quotient g, = (n; — n,,,)/n, is an estimate 
of the conditional probability p; of succumbing to the risk within the 
(¢ + 1) unit, given that the individual concerned escaped the risk 
through the first 7 units of time.—On many occasions the quotients q, 
exhibit a tendency to decrease with an increase in 7 so that, typically, 
Q: > Qi+1. This tendency suggests that the probabilities p; also decrease, 
DP; > pi+i. This latter phenomenon may be due to two different factures. 
One of them (7) is the selection effect. This selection takes place if the 
individuals forming the population are either unequally susceptible to 
the risk studied or live under varying degrees of exposure to the risk, 
or both. In consequence, their individual probabilities of succumbing 
to the risk are different. The individuals that succumb to the risk 
within an early unit of observation are frequently those with high 
probability of succumbing and the remainder of the sample has a 
decreased average probability of succumbing to the risk. The second 
factor (ii) capable of producing a decrease in probabilities p, is the 
possible cumulative effect of the ireatment.—The problem treated is 
that of distinguishing between the factors (7) and (72). 


| 
| 
| 
4 
€ 
| 
di 
Via 
ul 
50 
in 
val 
on 


501 


ABSTRACTS 439 


RENGER (Berlin): On the Inciting Conditions of Galvanic Skin 


Reflex. 


Analysis of the influence of vegetative tonus and consciousness on 
incitation of galvanic skin reflex. 


498 W.SCHAFER (Bensheim): On the Error of Confidence Intervals. 


In confidence interval estimation with discrete distributions the 
actual error more or less deviates from the preassigned level. Optimal 
confidence intervals are defined in the sense that the error approaches 
the preassigned value from the lower side most closely. Formulae are 
given for the binomial case. 


SCHRODER (Leipzig): Missing Plot Estimates by Means of 
499 Covariance Analysis in Fertilizer Field Experiments on Majorana 
hortensis MOENCH. 


Investigation of advisability of missing plot estimation by covariance 
analysis on the basis of 8 field experiments. Along with the usual 
assumptions in the analysis of variance the following prerequisites must 
be met: (1) The numbers of missing plots must be independent of the 
exy “imental question under consideration. (2) The missing plot mean 
squ® es within experiments must be homogeneous. (3) The effects 
of the missing plot values on yield must be linear.—A significant influ- 
ence of fertilizer on number of missing plots was demonstrated, which 
made analysis of covariance inapplicable. It is suggested to look out 
for similar situations in other field trials to avoid false information. 


K. SOLTH (Marburg): On the Relation Between Variance Com- 
500 ponents in the Case of Grouped and Ungrouped Correlated 
Variates. 


It is demonstrated that the relations between total variance, variance 
due to regression and the remainder stay unaltered under grouping of 
variates, though the absolute values of the variance components change 


under grouping. 


1D. STEPHAN (Darmstadt): Statistical Practice with Digital 
Electronic Computers. 


Programmes illustrating the use of the IBM 650 electronic computer 
in statistical analysis: Calculation of statistics, simple analysis of 
variance, chi-square test, simple and multiple linear correlation. Notes 
on programming and utility of electronic computation. 


“ay 
: 
4 
] 
| 
] 
: 
Ay Fis 
| 
) 
| 
‘ 
a 
1e 
18 = 


440 BIOMETRICS, SEPTEMBER 1958 


F. SULLWOLD (Frankfurt/M.): On the Use of Correlation 


o02 Methods in the Construction of Psychological Tests. 


Single questions are combined into tests under statistical considera- 
tions, correlation principles playing a prominent réle. The general 
logical consistence of the different correlation methods used follows 
from the demand for maximal reliability and validity of tests.—The 
reliability of a test is measured by the coefficient of reliability, the 
latter expressing the test autocorrelation which can be estimated by 4 
different methods.—The determination of reliability of a test necessitates 
consideration of correlations between the single questions and an ade- 
quate external criterion and the intercorrelations of questions. The 
validity of a test battery is judged particularly by multiple correlation 
methods.—Mainly biserial and point biserial correlations, tetrachoric 
correlations, and Phi-coefficients are used for question analysis. Choice 
of method follows from certain indications and needs circumspection. 


H. WAGNER (Bad Elster): The Interdiurnal Variation of Tem- 
503 perature and its Variance as a Measure With Bioclimatological 
Studies. 


Individual variation can be eliminated by considering deviations. 
To compensate for diurnal rhythm, 24-hour differences must be used. 
Approximate analyses can be done by means of frequency graphs. 
Comparison of mean values is performed by the difference method. 
The variance of 24-hour changes yields further information, particularly 
with respect to random fluctuations within certain time intervals. 


504 E. WALTER (Géttingen): Rank Correlation Methods. 


Critical review of different rank correlation coefficients. 


R. WETTE (Heidelberg): On the Biological Interpretation of the 
Logarithmic Series Distribution. 


The logarithmic series distribution of the number of species per 
genus etc. can be obtained by the following model: The probability that 
a taxonomist adds or excludes a taxonomical sub-unit (e.g. species) to 
or from a higher unit (e.g. genus) during the time interval ¢t, t + dt, 
be proportional (8, ») to the number of sub-units per unit and dd. 
In deference to this ordinary birth-and-death process, the taxonomist 
prevents a taxonomical unit from vanishing by allotting its last 
member to another unit during a revision (elastic barrier, g). The 


505 


i 
| 
| 
| 
| 
| 
| 
. 


ABSTRACTS 441 


resulting stochastic process can be described by the differential equation 
for its p.g.f. 


F(s, t) = (s — 1)-(8s — t) + — 


and yields a logarithmic series distribution in the stationary limit. 


F. WILHELM (Graz/Oesterreich) : Evaluation of Food Utilization 


Power (f.u.p.). 


Application of Petitpierre’s analysis of variance for analysis of 
f.u.p. in different cattle breeds of the Steiermark. Comparison of 
fed starch values and produced milk with respect to influence of climate 
on carotin content of food. 


J 
Se 
] 
] 
| 2 
“jew 
| 
| 
€ 


THE BIOMETRIC SOCIETY 


IUBS 


The Society was represented at the 13th General Assembly of the 
International Union of Biological Sciences, held in London on July 
12-14, 1958, by A. A. Buzzati-Traverso, M. J. R. Healy, and J. O. Irwin. 
The new President of the IUBS is Professor G. Montalenti, President 
of the Italian Region of the Biometric Society. 


Région Belgique et Congo Belge 

The Region has sustained a tragic loss by the unexpected death of 
the Regional President, Professor R. Laurent, on April 16th, 1958. A 
brief obituary appears elsewhere in this issue. Professor P. P. Denayer 
has been elected Regional President in his place. 


ENAR 


The Region held a joint meeting with the Institute of Mathematical 
Statistics at Gatlinburg, Tenn., on April 10-12, 1958. A highlight of the 
meeting was an address by Dr. C. I. Bliss entitled ‘The First Decade of 
the Biometric Society.’ Following this address, Dr. Bliss was presented 
on behalf of the Society with a desk set, inscribed: 


“For unselfish devotion, inspiring leadership, and untiring service 
in the founding, maintenance, and growth of the Biometric Society, 
the members are proud and honoured to present this token of their 
appreciation.” 


Other papers given at the meeting included—G. E. P. Box: Recent 
Work on Non-Linear Estimation and Design; R. C. Cornell: Estimation 
for Linear Combination of Exponentials; M. B. Wilk: Non-Linear 
Hypotheses; W. S. Connor: The Matrix Direct Product in the Analysis 
of Factorials; W. A. Glenn and C. Y. Kramer: Randomized Blocks with 
Missing Observations; V. L. Mote, M. V. Pavate, and R. L. Anderson: 
Analysis of Contingency Tables; E. J. Williams: Optimum Allocation for 
Polynomial Regression; R. A. Bradley: Designs in Taste-Testing; J. J. 
Gart: Sequential Decision Procedure for Survival Curves; R. J. Freund 
and R. W. Vail, Jr.: Residual Analysis; C. W. Clunies-Ross: Mixed 
Exponential Failure Distribution; J. R. Duffett: Estimating System 
Reliability from Component Reliabilities; W. N. Carey, Jr. and P. E. 
Irick: The AASHO Road Test; M. A. Kastenbaum: Estimating Sperm 
Frequencies in Drosophila; F. G. Martin, Jr. and C. C. Cockerham: 


442 


| 
| 
|_| 


THE BIOMETRIC SOCIETY 443 


High-speed Computers in Empirical Selection Studies; P. A. Miller, 
J. C. Williams, and H. F. Robinson: Genotype-Environment Interaction 
Variance Components in Cotton Breeding; T. Kelleher: Diallel Cross 
Methodology; H. A. David: Paired Comparisons and Tournaments; 
R. E. Bargmann: Dependence in Multivariate Analysis. 

A list of those attending the meeting was published in the June 1958 
issue of Biometrics. 


Région Frangaise 


A une réunion tenue le 21 mai a Paris, ont été données les contribu- 
tions suivantes—J. Raison: Tests non paramétriques; R. Meigniez: 
Test du nombre de maxima. 


CHANGES IN MEMBERSHIP 
(April-June, 1958) 


Changes of Address 


Mr. B. L. Adkins, Department of Statistics, University of Melbourne, 
Carlton N. E., Victoria, Australia 

Mr. John C. Bain, 25 South Munn Avenue, Apartment 310, East 
Orange, New Jersey, U.S.A. 

Dr. Huldah Bancroft, 819 East 58th Street, Richmond, Virginia, U.S.A. 

Mr. David Bruce, 1517 Sixth Street, Alexandria, Louisiana, U.S.A. 

Dr. Paul T. Bruyere, R.F.D. 3, Gaithersburg, Maryland, U.S.A. 

Mr. Melvin W. Carter, Department of xperimental Statisties, North 
Carolina State College, Raleigh, North Carolina, U.S.A. 

Prof. Gerald J. Cox, 4731 Stanton Avenue, Pittsburgh 1, Pennsylvania, 
U.S.A. 

Mr. William F. Elkin, 1308 Yerkes Street, Philadelphia 19, Pennsyl- 
rania, U.S.A. 

Mr. George E. Ferris, 236 Orleans Avenue, Battle Creek, Michigan, 
USS.A. 

Mr. Donald R. Fiester, U. 8S. Embassy, Guatemala City, Guatemala, 
Central America 

Dr. N. R. Fraser, Agricultural Research Station, Nelspruit, Eastern 
Transvaal, South Africa 

Dr. Benson Ginsberg, Behavorial Sciences Ceuter, 202 Juniperro Serra 
Boulevard, Stanford, California, U.S.A. 

Mr. Yves Graff, 6 Rue du Pont, Saint-Jacques, Caen, France 

Dr. Roy C. Hackman, Psychological Service of Pittsburgh, 902 Park 
Building, Pittsburgh 22, Pennsylvania, U.S.A. 


| 4 
| 
in 
J 


444 BIOMETRICS, SEPTEMBER 1958 


Mr. K. P. Haydock, 985 Waterworks Road, The Gap WS, Queensland, 
Australia 

Mrs. Lee Herrera, 40-45 Elbertson Street, Elmhurst, Long Island, New 
York, U.S.A. 

Dr. Herbert O. Hetzer, A. H. Research Division, Agricultural Research 
Center, Beltsville, Maryland, U.S.A. 

Dr. Henry Hopp, Agricultural Attache, American Embassy, Bogota, 
Colombia 

Dr. Theodore W. Horner, c/o General Mills, Inc., 9200 Wazata Boule- 
vard, Minneapolis 26, Minnesota, U.S.A. 

Prof. Shinya Iyami, Department of Applied Genetics, National Institute 
of Genetics, Mishima, Shizuoka-ken, Japan 

Mr. Masao Kiyoku, Faculty of Agriculture, Okayama University, 
Tsusima, Okayama City, Japan 

Mr. Richard A. Lamm, 44 Brightwood Avenue, Pearl River, New 
York, U.S.A. 

Dr. Henri Louis LeRoy, Tannstrasse 8, Thalwil, Switzerland 

Mr. Tobias Lewis, Statistical Laboratory, University of Manchester, 
Manchester 13, England 

Mr. E. Liang, Botany Department, University College of Ghana, 
Achimata, Ghana 

Mr. Nicholas E. Manos, Chief Statistician, Air Pollution Medical 
Program, Division of Special Health Service, Department of Health, 
Education and Welfare, Washington 15, D. C., U.S.A. 

Dr. Margaret P. Martin, Department of Preventive Medicine, Upstate 
Medical Center, 766 Irving Avenue, Syracuse 10, New York, U.S.A. 

Miss Ethelyne McBee, P.O. Box Uleta Branch, 475 N.E. 167th St., 
Miami, Florida, U.S.A. 

Mr. P. A. Parsons, Department of Agronomy, University of California, 
Davis, California, U.S.A. 

Prof. Roger G. Peterson, 204 Snell Hall, Oregon State College, Corvallis, 
Oregon, U.S.A. 

Mrs. Mary E. Ready, 206 Wakley Terrace, Bel Air, Maryland, U.S.A. 

Mr. R. J. Rowlands, Box 30, P.O., West Brunswick, Victoria, Australia 

Dr. William F. Royce, Fisheries Research Institute, University of 
Washington, Seattle 5, Washington, U.S.A. 

Dr. H. Fairfield Smith, Statistical Center, U.P., Post Office Box 479, 
Manila, Philippines 

Mr. Robert Teichman, North Carolina State College, Raleigh, North 
Carolina, U.S.A. 

Prof. Alan E. Treloar, 2237 Schiller Avenue, Wilmette, Illinois, U.S.A. 

Miss Sarah F. Welch, 1523 Clifton Road, N.E., Atlanta, Georgia, 
U.S.A. 


| 
: 
\ 
i 
48 


THE BIOMETRIC SOCIETY 445 


New Members 
At Large 


Mr. M. A. Guzman, Institute of Nutrition of Central America and 
Panama, Guatemala City, Guatemala, Central America 

Dr. Ireneusz Juvanez, VIII Koranyi Sandor u. 2/a, Budapest, Hungary 

Mr. D. K. Dutta Roy, Research Division, Ministry of Agriculture, 
(Wad-Medani) Sudan 


Australasian 


Mr. K. M. Cellier, c/o Division of Mathematical Statistics, C.S.I.R.O., 
University of Adelaide, Adelaide, Australia 

Mr. G. W. Hill, C.S.R.A.C., Computing Laboratory, University of 
Melbourne, Carlton N3, Victoria, Australia 


Belgian 


Mr. Jacques Bredas, I.N.E.A.C.-KM 5, Yangambi-via Stanleyville, 
Belgian Congo 


British 


Mr. W. C. Billewiez, M.R.C. Institute Medical Research Unit., Mid- 
wifery Department, Medical School, Forest Hill, Aberdeen, Scotland 

Dr. C. O. Carter, Clinical Genetics Research Unit, The Hospital for 
Sick Children, Great Ormond, London W.C., England 

Dr. B. Cromie, c/o John Wyette and Bro., Ltd., Clifton House, Easton 
Road, London NW 1, England 

Mr. G. J. Davies, ¢/o A.R.C. Unit of Statistics, University of Aberdeen, 
Meston Walk, Old Aberdeen, Scotland 

Dr. 8. E. Dicker, Pharmacology Department, University College 
London, Gower Street, London WC 1, England 

Dr. J. N. Hunt, Guys Hospital. London SE 1, England 

Mr. J. H. Ince, 7 Follyfield Road, Banstead, Surrey, Mngland 

Mr. R. H. E. Inkson, 30 Springfield Gardens, Aberdeen, Scotland 

Mr. C. R. B. Joyce, Department of Pharmacology, London Hospital 
Medical College, Turner Street, London E. 1, England 

Mr. R. J. Ladd, Pharmacology Department, University College London, 
Gower Street, London WC 1, England 

Prof. W. D. M. Paton, Department of Pharmacology, examination 
Hall, Queen Hall, London WC 1, England 

Mr. N. W. Please, Statistics Department, University College, Gower 
Street, London WC 1, England 

Dr. I. Schire, c/o Smith, Kline and French Laboratories, Ltd., 120 
Cold Harbor Lane, London SE 5, England 


a 
as 
= 
. 
f 
) 
ad 


446 BIOMETRICS, SEPTEMBER 1958 


Dr. Walter G. Smith, 174 Grove Park, Knutsford, Cheshire, England 

Dr. Frances E. Williams, Pharmacology Department, Pfizer Ltd., 
Sandwich, Kent, england 

Dr. M. E. Wise. 8 Arundel Avenue, Mordon, Surrey, England 

Prof. Ronald Woolmer, Research Department of Anesthetics, Royal 
College of Surgeons of England, Lincoln’s Tan Fields, London WC 2, 
England 


Danish 


Dr. E. W. Andersen, Niels Andersens Vej 76, Hellerup, Denmark 

Dr. IF. S. Andersen, Statens, Skadedyrlaboratorium, Springforbi, 
Denmark 

Dr. Viggo Dyrberg, Ledreborg Alle 40, Gentofte, Denmark 


ENAR 

Dr. Robert W. DeBaun, American Cyanamid Company, 1937 W. Main 
Street, Stanford, Connecticut, U.S.A. 

Mr. Arthur P. Dempster, Bell Telephone Laboratories, Murray Hill, 
New Jersey, U.S.A. 

Mr. Robert Fitzpatrick, 410 Amberson Avenue, Pittsburgh 32, Pennsyl- 
vania, U.S.A. 

Dr. Carl E. Marshall, Statistical Laboratory, Oklahoma State Uni- 
versity, Stillwater, Oklahoma, U.S.A. 

Mr. G. McLoughlin, P. O. Box 757, Edgewood, Maryland, U.S.A. 

Mr. William Mead, 4011 Morrison Drive, Lynchburg, Virginia, U.S.A. 

Dr. Paul D. Minton, 2612 Rosedale, Dallas 5, Texas, U.S.A. 

Mr. Peter H. Ovenburg, Department of Zoology, University of Michigan, 
Ann Arbor, Michigan, U.S.A. 

Mr. Joe Powell, Jr., 2429 E. Third Street, Chattanooga, Tennessee, 
U.S.A. 

Mr. Richard C. Trimble, 605 North Irving Street, Apartment 27, 
Arlington, Virginia, U.S.A. 


French 


Mr. Bernard Cyffers, 24 rue du Hameau, Ingenieur au Service d’Ex- 
ploitation Industrielle des Tabacs et des Allumettes, Paris, France 
Madame Helene Mouriesse, 37 rue de Babylone, Aide-technique au 
Service des Recherches Biologiques du S.E.I.T.A., Paris 7e, France 

Mr. Rives, Charge de Recherches, Station de Recherches Viticoles du 
Centre de Recherches Agronomiques du Sud-Ouest, LaGrande 
Ferrande, Pont-de-la-Maye (Gironde), France 


a 
| 
| 
| | 
4 


THE BIOMETRIC SOCIETY 447 


German 
Dipl. Math. Gunther Hox, Kerckhoffstr. 16, Essen-West, Germany 
Dr. Gisela Reissig, Hubnerstr. 2, Dresden Al, Germany 


Indian 


Mr. 8. K. Bose, Joint Director, Central State Organization, B-Barracks, 
Janpath, New Delhi-1, India 

Dr. Uttam Chand, Officer of Special Duty, Central Statistical Organi- 
zation, B-Barracks, Janpath, New Delhi-1, India 

Dr. U.S. Nair, Professor of Statistics, Kesala University, Trivandrum, 
8. India 


Japanese 

Mr. Takeo Abe, 2448, Setagawa 3, Setagawa-ku, Tokyo, Japan 

Mr. Tashiro Haga, Sanyo Pulp Co., Ltd., Marunouchi, Chiyoda-ku, 
Tokyo, Japan 

Mr. Shizuo Ito, 226, Tamagawa-Okusawa 1, Setagawa-ku, Tokyo, 
Japan 

Mr. Sotoshi Shinbo, Oyaguchi-machi, Itabashi-ku, Tokyo, Japan 


Netherlands 


Dr. J. van Noordwijk, Farmacotherapeutisch Laboratorium, Polderweg 
20, Amsterdam, Netherlands 


WNAR 


Mr. Purna Chandra, Bacteriology Department, Oregon State College, 
Corvallis, Oregon, U.S.A. 

Dr. Rex L. Hurst, Department of Applied Statistics, Utah State 
University, Logan, Utah, U.S.A. 

Dr. Emanuel Parzen, Stanford University, Department of Statistics, 
Stanford, California, U.S.A. 


|.) 
| 
ke 


NEWS AND ANNOUNCEMENTS 


Members are invited to transmit to their National or Regional Secretary 
(tf members at large, to the General Secretary) news of appointments, 
distinctions, or retirements, and announcements of professional interest. 


Alan T. James of the Division of Mathematical Statistics, Common- 
wealth Scientific and Industrial Research Organization, Australia, will 
be Visiting Lecturer at Yale University for the academic year 1958-59. 

Jerome Cornfield, formerly Assistant Chief of the Biometrics Branch 
of the Division of Research Service at the National Institutes of Health, 
has been appointed to two professorships at the Johns Hopkins Medical 
Institutions, effective July 1, 1958. He is Professor and Chairman of the 
Department of Biostatistics in the School of Hygiene and Public Health, 
succeeding William G. Cochran. He also has a newly created appoint- 
ment, that of Professor of Biomathematics in the School of Medicine. 
This dual appointment reflects the increasing importance attached to 
the field of applied mathematics in these institutions and the strength 
envisioned if there is a unified development. 

Gertrude Mary Cox was awarded an honorary Doctor of Science 
degree by Iowa State College during its Founder’s Day centennial 
observance; she was cited as “‘teacher, researcher, leader, and adminis- 
trator in the field of statistics.”’ 

George Waddell Snedecor was awarded an honorary Doctor of Science 
' degree by Iowa State College during its Founder’s Day centennial 
observance and cited as “teacher, author, pioneer in experimental 
statistics.” 

Huldah Bancroft retired from her position as Professor of Biostatistics 
at Tulane University, New Orleans, Louisiana, on July 1, 1958, She has 
been appointed Emeritus Professor of Biostatistics and will make her 
future home in Richmond, Virginia. Miss Bancroft has been a staunch 
member of the Society for many years. Her new address will be: 819 
East 58th Street, Richmond 24, Virginia. 


BIOMETRIC TRAINING PROGRAM EXPANDED AT IOWA 
STATE COLLEGE 


The Department of Statistics and the Statistical Laboratory of 
Iowa State College will substantially expand their present graduate 


448 


H 
j 
<j 
1 


3 
9 


of 
te 


NEWS AND ANNOUNCEMENTS 449 


training program in biostatistics with the aid of a five-year grant from 
the National Institutes of Health. This award will provide support 
for several graduate students in statistics per year as candidates for the 
M.S. or Ph.D. degree, with a view to stimulating their interest in 
biometry, medical statistics, or public health as a career. It will also 
give partial support to one staff member so that he can devote more 
time to those areas of statistical application. 

One feature of the expanded program is that biostatistics trainees, 
while working toward masters’ or doctors’ degrees in statistics, will 
spend up to three months each year at some selected medical school 
or public health center to round out their experience through contact 
with biometric data in the field or laboratory. So far, three new trainee- 
ships have been established for the 1958-59 year. Further details about 
the expanded biostatistics program and application forms for trainee- 
ships for the 1959-60 year may be obtained from the Department of 
Statistics, lowa State College, Ames, Iowa. 


OKLAHOMA FRONTIERS OF SCIENCE FOUNDATION 
GRANTS ANNOUNCED 


Three cash grants totaling $6,800.00, marking the initiation of a 
new research support program by the Frontiers of Science Foundation 
of Oklahoma, Inc., have been announced by Mr. D. A. McGee, President 
of the Foundation. 

Dr. Sheridan H. Lee, Professor of Biology, Oklahoma Baptist 
University, Shawnee, Oklahoma, received a grant of $3,000 for support 
of an investigation of Oklahoma nematodes. A grant of $800 was made 
to Dr. Fred W. Allen, Professor of Biology, Southwestern State College, 
Weatherford, Oklahoma, for support of a study of teachers and teaching 
facilities in secondary school science, and a grant of $3,000 was made 
to Dr. Beryl E. Clotfelter, Assistant Professor of Physies, also of 
Oklahoma Baptist University, for support of an investigation of the 
conductivity of Pyrex glass containing hydrogen plasma. 

The new program will provide support for individual investigators 
located in Oklahoma’s smaller colleges and universities. The grants 
are limited to Oklahoma and to the fields of natural sciences, mathe- 
matics, science education, and mathematics education. 


FELLOWSHIPS IN PSYCHOMETRICS 


The Educational Testing Service, Princeton, New Jersey, is offering 
for 1959-60 its twelfth series of research fellowships in psychometrics 


‘ 
| 
> 
a 
re 
helt 
} 


450 BIOMETRICS, SEPTEMBER 1958 


leading to the Ph.D. degree at Princeton University. Open to men who 
are acceptable to the Graduate School of the University, the two fellow- 
ships each carry a stipend of $2,650 a year and are normally renewable. 
Fellows will be engaged in part-time research in the general area of 
psychological measurement at the offices of the Educational Testing 
Service and will, in addition, carry a normal program of studies in the 
Graduate School. 

Suitable undergraduate preparation may consist either of a major 
in psychology with supporting work in mathematics, or a major in 
mathematics together with some work in psychology. However, in 
choosing fellows, primary emphasis is given to superior scholastic attain- 
ment and research interests rather than to specific course preparation. 

The closing date for completing appiications is January 2, 1959. 
Information and application blanks will be available about September 
15 and may be obtained from: Director of Psychometric Fellowship 
Program, Educational Testing Service, 20 Nassau Street, Princeton, 
New Jersey. 


NEW DEPARTMENT OF GENETICS 


Effective July 1, 1958, Dr. H. F. Robinson will become head of the 
newly organized Department of Genetics at North Carolina State 
College, Raleigh, North Carolina. The department will consist of the 
genetics faculty previously administered in the division of Biological 
Science and the quantitative geneticists in the Institute of Statistics 
and Department of Experimental Statistics. In addition to expansion 
of the research and teaching functions within the Department of Ge- 
netics, emphasis will be devoted to coordinating the work with genetics 
in other departments. Twenty-two geneticists in the applied breeding 
fields are associate members of the new department. 

Evolution, cytogenetics, and quantitative genetics are the major 
areas of research emphasis with close interdepartmental coordination 
with the plant and animal breeding programs. Dr. 8. G. Stephens, 
previously in charge of the genetics faculty, has been awarded a Gug- 
genheim travel grant to collect and study species in Central America 
and will devote full time to this phase of research in which he has al- 
ready achieved international recognition. 


r 
a 
7 
7 


OBITUARY 


RoGer LAURENT 


La Société belge de biométrie Adolphe Quetelet perd en Roger 
Laurent, son président en exercice, l’un de ses membres les plus éminents 
et les plus dévoués. Inscrit parmi les fondateurs d’une association qui a 
pour objet le développement de la biologie quantitative sous ses aspects 
les plus larges, Laurent avait ainsi affirmé, dés 1952, l’intérét qu’en sa 
qualité de directeur du Service de contréle des médicaments de 1’Associa- 
tion pharmaceutique belge, il portiat aux ressources que les méthodes 
statistiques offrent 4 l’expérimentateur. C’est en effet en biologiste 
averti qu’il dirigeait cet ensemble de laboratoires dont, sous son impul- 
sion, la réputation scientifique et l’utilité se manifestérent bient6t avec 
éclat. Conscient des responsabilités qui pesaient sur son service, Laurent 
ne se borna pas 4 instaurer les contréles biologiques, bactériologiques ou 
chimiques des échantillons présentés. Grace 4 son initiative, l’analyse de 
ceux-ci put bénéficier des garanties que donne la planification des 
recherches, c’est-a-dire |l’interprétation statistique des données 
numériques. 

Si mon premier contact avec Roger Laurent remonte 4 la fondation 
de notre société, c’est encore 4 la récente assemblée statutaire de notre 
association que nous etimes, lui et moi, notre dernier entretien, avant 
une conférence qu’il avait proposée et que l’un de ses collaboraeurst 
allait consacrer 4 l’application des méthodes statistiques aux analyses 
et aux essais biologiques. Le choix méme de ce sujet montre 4 quel point 
l’attention de notre regretté Président demeurait fixée sur le réle de la 
biologie quantitative dans les activités de son Institut. 

Tous ceux qu’un méme deuil rassemble ici, tous ceux qui honorent 
en Roger Laurent l’homme de science s’inclinent aussi devant l’homme de 
devoir et homme de coeur. Nous savons tous que s’il s’est attaché avec 
tant de zéle 4 ses travaux et 4 ses fonctions, ce n’est pas seulement par 
sa passion de la recherche scientifique, ¢’est aussi en raison de son esprit 
social, de sa volonté de servir et d’aider son prochain. C’est ce double 
souvenir que conservera de son Président la Société Adolphe Quetelet. 
Puissent notre affliction et notre fidélité unanimes étre pour la famille 
cruellement éprouvée un témoignage de notre profonde sympathie. 


P. Spehl 


451 


‘3 
- 
as 
a 
Bal 


JOURNAL OF THE 


AMERICAN STATISTICAL 
ASSOCIATION 


Volume 53 September, 1958 Number 283 


ARTICLES 


Influence of the Interviewer on the Accuracy of Survey Results 
Rosert H. Hanson Ex S. Marks 


Demand for Farm Products at Retail and the Farm Level Some Empirical 
Measurements and Related Problems................... Rex F. Daty 


Investment Estimates of Underdeveloped Countries: An Appraisal 
I. ABRAHAM 


Manufacturers’ Inventory Cycles and Monetary Policy.. Doris M. ErseMANN 


Leading American Statisticians of the Nineteenth Century II 
J. FitzPatrick 


Rectifying Inspection of a Continuous Output............. F. J. ANSCOMBE 
Ranking Methods and the Measurement of Attitudes........... R. JARDINE 


Randomization Tests for a Multivariate Two-Sample Problem 
J. H. anv D. A. S. Fraser 


A Method of Adjustment for Defective Data 
Morris JAMES SLONIM AND CuEsTeR H. McCatt, Jr. 


K. V. RAMACHANDRAN 


BOOK REVIEWS 
NOTES ABOUT AUTHORS 


PUBLICATIONS RECEIVED 


AMERICAN STATISTICAL ASSOCIATION 
1757 K Street, N.W., Washington 6, D.C. 


Information on memberships, subscriptions, and back numbers should be 
requested from the Executive Director, American Statistical Association, 
1757 K Street, N.W., Washington 6, D. C., U.S.A. 


As 
ind 
4 
st 


io 
: 
‘ 
a 


4 
2 
{ 
! 


INFORMATION FOR CONTRIBUTORS 


MANUSCRIPTS 


Contributions for Biometrics may be addressed to Dr. Ralph A. Bradley, Depart- 
ment of Statistics, Virginia Polytechnic Institute, Blacksburg, Virginia, U.S.A.; 
authors residing in the following Society Regions can expedite consideration of papers 
by submitting them to the appropriate Associate Editor, namely; BRITISH RE- 
GION: Dr. 8. C. Pearce, East Malling Research Station, East Malling, Maidstone, 
Kent, England; AUSTRALASIAN REGION: Dr. E. A. Cornish, University of 
Adelaide, Adelaide, Australia; FRENCH REGION: Dr. Georges Teissier, Faculté 
des Sciences de Paris, 1 rue V. Cousin, Paris, France. QUERIES, NOTES, and 
related correspondence should be directed to Professor G. W. Snedecor, Statistical 
Laboratory, Iowa State College, Ames, Iowa, U.S.A. 

MANUSCRIPTS must be submitted in triplicate, with typescript doublespaced 
throughout. Marginal notes may obviate typographical difficulties presented by 
complicated formulae or tables—authors should not attempt editorial instructions 
or markings for the printer. TABLES should be identified by arabic number and 
by a short descriptive title. ILLUSTRATIONS should also be identified by arabic 
number and by a brief caption. (Captions should not be included in illustrations, 
but should be typewritten collectively on an accompanying sheet.) Originals 
should be approximately 8.5 x 11 in. (21.5 x 28 em.). The original of each chart, 
diagram, or graph should be executed in black on white drawing paper or board, on 
blue tracing linen, or on coordinate paper ruled in blue only; coordinate lines to be 
reproduced should be ruled in black. For printing, illustrations may be reduced to 
¥ or 4 original dimensions. Lines should therefore be of sufficient thickness, and 
decimal points, periods, and stippled dots should be solid black circles large enough 
to reproduce well. Lettering and numerals should be at least 1 mm. high when 
reproduced in a cut 3 in. (7.5 cm.) wide. Photographs should be prints on glossy 
paper with strong contrasts, and if grouped in a plate should be mounted contig- 
uously. All tables and illustrations should be mentioned explicitly in the text. 


REFERENCES (BIBLIOGRAPHIC) should be collectively listed alphabetically 
by author; textual citation by author and year is preferred. 


ABSTRACTS 


Abstracts of papers presented at meetings of the Biometric Society or of ite 
regions are printed in Biometrics following such meetings. They should be submitted 
to the person designated to receive them for a particular meeting in exactly the form 
published in Biometrics (except for an Abstract Number), doublespaced on bond 
paper, and in duplicate. Use of formulae requiring display printing is to be avoided. 


Notices, ANNOUNCEMENTS, AND Biometric Society REPoRTs 


International and regional reports and notices should be submitted by the 
appropriate officers of the Society and its Regions in duplicate doublespaced on 
separate sheets exactly as they are to be printed in Biometrics. Other material to 
be printed in News and Announcements should also be submitted doublespaced 
and in duplicate. 


SusTatninc MEMBERS OF THE BIOMETRIC SociETY 


Abbot Laboratories 

American Cancer Society, Inc. 

Merck, Sharp and Dohme Research Laboratories 
Schering Corporation 

Smith, Kline and French Laboratories 

E. R. Squibb and Sons 


Wyeth Institute of Applied Biochemistry 


i 
| 
! 
| 
4 


BACK ISSUES 


Back issues of Biometrics are availablé at the following postage-paid 
prices in U.S.A. currency: 


Price per Price per 
Year Volume Number Single Number Volume(unbound) 
1945 1 1 to6 $1.00 $6.00 
1946 2 1 to6 1.00 6.00 
1947 3 lto4 1.50 5.00 
1948 4 1 to4 1.50 5.00 
1949 5 1 to4 1.50 5.00 
1950 6 1 to4 1.50 5.00 
1951 7 1 to4 2.00 8.00 
1952 8 1 to4 2.00 8.00 
1953 9 1 to4 2.00 8.00 
1954 10 1 to4 2.00 8.00 
1955 11 lto4 2.00 8.00 
1956 12 1 to4 2.00 8.00 
1957 13 1 to4 2.00 8.00 


Inquiries, non-member subscriptions, and orders for back issues should 
be addressed to: 
BIOMETRICS 
DEPARTMENT OF STATISTICS 
Vireinia PoLyTecunic INstiTuTE 
BLacksBurRG, Vireinis, U.S.A. 


Reprints of individual articles are not available except to authors at the 
time of printing. Three special issues are among the numbers listed 
above. They are: 


1947 Volume 3 Number 1 The Analysis of Variance 
1951 Volume 7 Number 1 Components of Variance 
1957 Volume 13 Number 3 The Analysis of Covariance 


3 
cates 
| 


